Monday, November 29, 2010

NSF Funds Phenotype Ontology Coordination Network

The National Science Foundation recently awarded just over $500,000 to the University of South Dakota (USD) for the creation of the Phenotype Ontology Research Coordination Network. According to the grant proposal, the PI at USD is Dr. Paula Mabee and coPIs include Dr. Andrew Dean (North Carolina State University), Dr. Eva Huala (Stanford University), Suzanna Lewis (Lawrence Berkley National Laboratory), and Dr. Anne Maglia (Missouri University of Science and Technology).

The goal of the grant is to ...establish a network of scientists who are developing phenotype ontologies and to use this network to enable and enhance the research of all of those involved.

The specific aims are:
  1. Develop anatomical reference ontologies for three key taxonomic clades;
  2. Align and synchronize anatomical ontologies using homology and various types of similarity relations;
  3. Define, test and document anatomy ontology development best practices and standards;
  4. Reach out to ancillary phenotype groups to share with them common concepts and practices;
  5. Educate the community about the methods for developing ontologies and their importance and utility in research.
The principals of the grant are soliciting participation in the network. To join, you can sign up here.

Saturday, November 27, 2010

The Statistical Core Vocabulary: Can Definitions Really be this Bad?

I just learned of the Statistical Core Vocabulary (abbreviated as SCOVO), the latest version of which one may find here. This vocabulary defines itself as: A vocabulary for representing statistical data on the Web.

It contains 3 classes and 5 properties. For a small artifact that aims to capture the core entities in a domain, one ought to expect that careful attention would be paid to the definitions of the classes and properties, on the part of its creator(s).

Clearly, that is not the case here.

The SCOVO definition of Dataset is: a statistical dataset. This is equivalent to defining subatomic particle as: a positively-charged subatomic particle.

Similarly, the SCOVO definition of Item is: a statistical data item. This definition is even worse, being the equivalent of defining book as: a red hardcover book.

Finally, the SCOVO definition of its third and final class, Dimension, is: a dimension of a statistical data item. This definition is the equivalent of defining length as the length of a rectangular wooden table.

The situation for the properties in SCOVO is no better. Two of the properties even have the same name as two of the classes: dimension and dataset. The dataset relation is intended to link an item to a dataset. The dataset relation is thus the equivalent of having a relation liver to link the liver itself to the organism of which it is part.

By contrast, the Information Artifact Ontology (IAO) has taken pains to define the terms data item and dataset well. Definitions of these terms, and other terms used in their definitions, follow:

Data item: an information content entity that is intended to be a truthful statement about something (modulo, e.g., measurement precision or other systematic errors) and is constructed/acquired by a method which reliably tends to produce (approximately) truthful statements.

Information content entity: an entity that is generically dependent on some artifact and stands in relation of aboutness to some entity.

Note that for the definition of generic dependence, the IAO relies on a formal, upper ontology called Basic Formal Ontology.

Data set: a data item that is an aggregate of other data items of the same type that have something in common. Averages and distributions can be determined for data sets.

In the IAO, data items stand in the "is about" relation to some entity. Thus, data are linked to the world that they are about.

SCOVO does have one thing going for it: it serves as an exemplar of the worst possible way to define terms.

Wednesday, September 1, 2010

Physician decries lack of definitions in informatics

Dick Stanley, MD, Chief Medical Information Officer (CMIO) at Cooley-Dickinson Hospital, Northampton, Mass, decries the lack of definitions in informatics. He states: Interestingly, I’m having trouble finding even basic definitions of very common terms...

Dr. Stanley correctly points out that the science of informatics requires good definitions of terms to make progress. He notes that ...we’re supposed to be the people who care about [definitions].

However, the field of biomedical informatics has long been delinquent in this regard. The definitions it does generate are often quite poor. Like many outsiders who are recent to the field of informatics, Dr. Stanley notes that ...the whole informatics industry suffers from a tremendous lack of definitions.

As a member of that "industry" for 15 years, Dr. Stanley, let me say, guilty as charged.

Fortunately, there is light on the horizon. Criticism in this regard from ontologists like Dr. Barry Smith and Dr. Werner Ceusters at the University at Buffalo is beginning to turn the tide. They have been advocating for high-quality definitions in ontologies and information models for years, culminating in the adoption at least 3 principles mandating definitions of high quality by the OBO Foundry.

Of course, many ontologists, typically those with an engineering and/or computer science background, are anxious to have rigorous logical axioms that define ontology terms. We caution that getting textual definitions for human understanding correct is a necessary first step toward the logical axioms they rightly desire for ontologies to support automated reasoning.

But we ought not to put any carts before the horse of textual definitions, be they the conduct of science, the application of science to the clinical realm, or the development of logical axioms to automate cognitive activities of humans.

Friday, July 30, 2010

LOINC Changes Frustrating Semantic Interoperability in Canada

According to minutes of the 2010-07-29 HL7 Vocabulary Working Group call, the new release of Logical Observation Identifiers, Names, and Codes (LOINC) is creating ambiguity about how to represent things like birth date. Apparently, it is now possible to represent birth date as an "observation", with a value that is a date. This representation conflicts with the "usual" way of representing birth date in the HL7 Clinical Document Architecture (CDA) standard, which represents it as a "demographic attribute of a person".

Really, the fundamental issue here is the failure to recognize that terminologies such as LOINC and SNOMED CT are representations developed in uncoordinated fashion with HL7's CDA, also a representation. Which is to say, that terminologies, information models, and ontologies (obviously) all make ontological commitments.

Conflicting views such as these are common. Indeed the whole TermInfo group of HL7 was created to reconcile conflicting representations (HL7 and SNOMED CT) to ensure (as much as possible) semantic interoperability.

Friday, April 9, 2010

Confusion reigns--and remains--about SNOMED CT licensing

On a Clinical and Translational Science Award (CTSA) Consortium call about data standards and interoperability, participants raised numerous questions about the allowable uses of SNOMED CT in the applications they are developing to facilitate translational science.

If I develop a web application that uses SNOMED CT codes under the covers, may I allow users in a country without a SNOMED CT license to access it?

If I develop a subset of SNOMED CT codes (sometimes also known as a "value set") to serve as the set of answers to an online survey question, can I publish this subset for others to use (thereby facilitating interoperability among surveys)?

By the end of the call, attendees had more questions than answers.

Wouldn't the open approach taken by the Open Biomedical Ontologies (OBO) Foundry serve the purpose of interoperability better? Shouldn't the United States redirect at least some of its investment in SNOMED CT into truly open standards?

Friday, March 5, 2010

First OBO Foundry Ontologies Announced

The OBO Foundry Coordinating Editors today announced the first set of ontologies to be included in the OBO Foundry. They are:
  • CHEBI: Chemical Entities of Biological Interest
  • GO: Gene Ontology
  • PATO: Phenotypic Quality Ontology
  • PRO: Protein Ontology
  • XAO: Xenopus Anatomy Ontology
  • ZFA: Zebrafish Anatomy Ontology
Candidates that came close to membership but require more extensive revision first include the Cell Ontology (CL) and the Foundational Model of Anatomy (FMA). In particular, the FMA requires an open source license.

The Editors recommend that these ontologies ...serve as preferred targets for community convergence.

We agree.

Tuesday, February 16, 2010

Hymenoptera Ontology Receives NSF Funding

Somehow, we always seem to find out that ontology projects receive funding well after the fact.

Nevertheless, here is another one. The Hymenoptera Ontology has received funding from the National Science Foundation. The funding started on April 1, 2009.

You can read more at the NSF site linked above, but here is a brief summary:

This project will bring Hymenoptera researchers together to build a consensus structured vocabulary (the Hymenoptera Anatomy Ontology) that 1) enables discovery of research results from publications, 2) empowers taxonomists to efficiently describe/ diagnose species and 3) provide improved access to information for policy makers, farmers, land managers and the general public.

Wednesday, January 20, 2010

OWL 2.0 is now a W3C Recommendation

In followup to a December, 2008 post, OWL 2.0 became a W3C Recommendation on October 27, 2009.

A W3C Recommendation is the final stage of the ratification process of a standard by the World Wide Web Consortium (W3C) working group. Thus, except for errata correction, OWL 2.0 is a final standard.

The OWL 2.0 specification is here.

Monday, January 11, 2010

Ontologies to facilitate revolution in scientific publishing

In an article in Science entitled Strategic Reading, Ontologies, and the Future of Scientific Publishing, authors Allen Renear and Carole Palmer argue that ontologies will facilitate a revolution in scientific publishing whereby scientists will interact increasingly with the literature on a particular topic as whole and less frequently with entire, individual articles.

They state:

The revolution in scientific publishing that has been promised since the 1980s is about to take place. Scientists have always read strategically, working with many articles simultaneously to search, filter, scan, link, annotate, and analyze fragments of content. An observed recent increase in strategic reading in the online environment will soon be further intensified by two current trends: (i) the widespread use of digital indexing, retrieval, and navigation resources and (ii) the emergence within many scientific disciplines of interoperable ontologies. Accelerated and enhanced by reading tools that take advantage of ontologies, reading practices will become even more rapid and indirect, transforming the ways in which scientists engage the literature and shaping the evolution of scientific publishing.

A key enable of the revolution is the development of scientific ontologies, which serve as computational scientific theories. The authors note that:

Originally motivated by the need for data integration, scientific ontologies are now being explored for STM publishing to support information retrieval and text mining, with applications for hypothesis generation and knowledge discovery well underway.

They also highlight the need for collaborative development of ontologies to ensure interoperability, noting that Although many biological ontologies were originally developed independently, the need for interoperability has driven collaboration, a good example being the Open Biomedical Ontologies (OBO), which currently has 54 participating projects (18), including Microarray Gene Expression Data (MGED), BioPAX, for biological pathways data, and Foundational Model of Anatomy (FMA).