Monday, November 29, 2010

NSF Funds Phenotype Ontology Coordination Network

The National Science Foundation recently awarded just over $500,000 to the University of South Dakota (USD) for the creation of the Phenotype Ontology Research Coordination Network. According to the grant proposal, the PI at USD is Dr. Paula Mabee and coPIs include Dr. Andrew Dean (North Carolina State University), Dr. Eva Huala (Stanford University), Suzanna Lewis (Lawrence Berkley National Laboratory), and Dr. Anne Maglia (Missouri University of Science and Technology).

The goal of the grant is to ...establish a network of scientists who are developing phenotype ontologies and to use this network to enable and enhance the research of all of those involved.

The specific aims are:
  1. Develop anatomical reference ontologies for three key taxonomic clades;
  2. Align and synchronize anatomical ontologies using homology and various types of similarity relations;
  3. Define, test and document anatomy ontology development best practices and standards;
  4. Reach out to ancillary phenotype groups to share with them common concepts and practices;
  5. Educate the community about the methods for developing ontologies and their importance and utility in research.
The principals of the grant are soliciting participation in the network. To join, you can sign up here.

Saturday, November 27, 2010

The Statistical Core Vocabulary: Can Definitions Really be this Bad?

I just learned of the Statistical Core Vocabulary (abbreviated as SCOVO), the latest version of which one may find here. This vocabulary defines itself as: A vocabulary for representing statistical data on the Web.

It contains 3 classes and 5 properties. For a small artifact that aims to capture the core entities in a domain, one ought to expect that careful attention would be paid to the definitions of the classes and properties, on the part of its creator(s).

Clearly, that is not the case here.

The SCOVO definition of Dataset is: a statistical dataset. This is equivalent to defining subatomic particle as: a positively-charged subatomic particle.

Similarly, the SCOVO definition of Item is: a statistical data item. This definition is even worse, being the equivalent of defining book as: a red hardcover book.

Finally, the SCOVO definition of its third and final class, Dimension, is: a dimension of a statistical data item. This definition is the equivalent of defining length as the length of a rectangular wooden table.

The situation for the properties in SCOVO is no better. Two of the properties even have the same name as two of the classes: dimension and dataset. The dataset relation is intended to link an item to a dataset. The dataset relation is thus the equivalent of having a relation liver to link the liver itself to the organism of which it is part.

By contrast, the Information Artifact Ontology (IAO) has taken pains to define the terms data item and dataset well. Definitions of these terms, and other terms used in their definitions, follow:

Data item: an information content entity that is intended to be a truthful statement about something (modulo, e.g., measurement precision or other systematic errors) and is constructed/acquired by a method which reliably tends to produce (approximately) truthful statements.

Information content entity: an entity that is generically dependent on some artifact and stands in relation of aboutness to some entity.

Note that for the definition of generic dependence, the IAO relies on a formal, upper ontology called Basic Formal Ontology.

Data set: a data item that is an aggregate of other data items of the same type that have something in common. Averages and distributions can be determined for data sets.

In the IAO, data items stand in the "is about" relation to some entity. Thus, data are linked to the world that they are about.

SCOVO does have one thing going for it: it serves as an exemplar of the worst possible way to define terms.