Saturday, November 27, 2010

The Statistical Core Vocabulary: Can Definitions Really be this Bad?

I just learned of the Statistical Core Vocabulary (abbreviated as SCOVO), the latest version of which one may find here. This vocabulary defines itself as: A vocabulary for representing statistical data on the Web.

It contains 3 classes and 5 properties. For a small artifact that aims to capture the core entities in a domain, one ought to expect that careful attention would be paid to the definitions of the classes and properties, on the part of its creator(s).

Clearly, that is not the case here.

The SCOVO definition of Dataset is: a statistical dataset. This is equivalent to defining subatomic particle as: a positively-charged subatomic particle.

Similarly, the SCOVO definition of Item is: a statistical data item. This definition is even worse, being the equivalent of defining book as: a red hardcover book.

Finally, the SCOVO definition of its third and final class, Dimension, is: a dimension of a statistical data item. This definition is the equivalent of defining length as the length of a rectangular wooden table.

The situation for the properties in SCOVO is no better. Two of the properties even have the same name as two of the classes: dimension and dataset. The dataset relation is intended to link an item to a dataset. The dataset relation is thus the equivalent of having a relation liver to link the liver itself to the organism of which it is part.

By contrast, the Information Artifact Ontology (IAO) has taken pains to define the terms data item and dataset well. Definitions of these terms, and other terms used in their definitions, follow:

Data item: an information content entity that is intended to be a truthful statement about something (modulo, e.g., measurement precision or other systematic errors) and is constructed/acquired by a method which reliably tends to produce (approximately) truthful statements.

Information content entity: an entity that is generically dependent on some artifact and stands in relation of aboutness to some entity.

Note that for the definition of generic dependence, the IAO relies on a formal, upper ontology called Basic Formal Ontology.

Data set: a data item that is an aggregate of other data items of the same type that have something in common. Averages and distributions can be determined for data sets.

In the IAO, data items stand in the "is about" relation to some entity. Thus, data are linked to the world that they are about.

SCOVO does have one thing going for it: it serves as an exemplar of the worst possible way to define terms.

No comments:

Post a Comment