Monday, July 25, 2011

On the eve of ICBO 2011

I am here in Buffalo on the eve of the International Conference on Biomedical Ontology 2011. The conference begins with workshops and tutorials on Tuesday and Wednesday, followed by the 3-day conference program on Thursday, Friday, and Saturday.

The conference web site is here.

I will be attending the Representing Adverse Events Workshop tomorrow, then participating in a more meaningful way in the Improving Structured EHR Data Tutorial on Wednesday morning.

Over the next few days, it is my intent to post summaries of the best and worst of this year's ICBO, as I did at the last ICBO in 2009.

Friday, April 29, 2011

ISO TR 12310 Draft Standard - Problems and Issues

A draft of the ISO TR 12310 standard came across our desks in the last day or two, and we were surprised at the substantial problems that exist in its fundamental components. The title of the standard is "Health informatics — Principles and guidelines for the measurement of conformance in the implementation of terminological systems".

First, it defines a concept as "A concept is a single mental representation of some real or abstract thing."

This definition is problematic, because never in the history of terminological systems has anyone elucidated a theory of a basic unit of mental representation. What is a single mental representation? Or a unit of thought (as concept has also been defined)?

If I'm thinking of a car, is that a unit of thought? What if I'm thinking about a particular red, C-class Mercedes Benz with a particular Vehicle Identification Number? Or what if I'm just thinking about red, C-class Mercedes owned by stock brokers and that need new brakes and on which a tree fell during a Spring storm, in general? How many unique mental representations are there? One, two, three? How do we count unique ideas and single mental representations?

This question is not trivial, as the answer determines what concept representations we are permitted to have. Historically, the answer has been that anything anyone wants to put in the system is allowable. For a real-world example, consider the following "concept" from SNOMED CT: Family history of myocardial infarct in first degree female relative less than 65 years of age (situation).

The draft standard then goes on to say: "Concepts are should be unique within a code system." The grammatical problem aside, the standard is now engaged in use-mention confusion. For your mental representations are not part of any code system! The standard is using the word concept for both representations of concepts and concepts (themselves mental representations).

The draft standard then goes on to say: "A concept representation is a mechanism by which the system can express a concept." So it should be concept representations that are unique within a code system, not concepts.

But then the standard says: "Most code systems support multiple representations for each concept, sometimes even multiple representations of a given type."

How do code systems support interoperability, then?

Next, we are told that a "Concept id" is "A concept representation that is unique within the code system and that is used internally by the code system when referencing concepts."

Does that mean we cannot use concept ids outside the code system, either because it's impossible or disallowed? Surely some software applications are using SNOMED CT concept ids external to SNOMED CT itself? Are they out of conformance?

And that covers just pages 1 and 2.

Monday, November 29, 2010

NSF Funds Phenotype Ontology Coordination Network

The National Science Foundation recently awarded just over $500,000 to the University of South Dakota (USD) for the creation of the Phenotype Ontology Research Coordination Network. According to the grant proposal, the PI at USD is Dr. Paula Mabee and coPIs include Dr. Andrew Dean (North Carolina State University), Dr. Eva Huala (Stanford University), Suzanna Lewis (Lawrence Berkley National Laboratory), and Dr. Anne Maglia (Missouri University of Science and Technology).

The goal of the grant is to ...establish a network of scientists who are developing phenotype ontologies and to use this network to enable and enhance the research of all of those involved.

The specific aims are:
  1. Develop anatomical reference ontologies for three key taxonomic clades;
  2. Align and synchronize anatomical ontologies using homology and various types of similarity relations;
  3. Define, test and document anatomy ontology development best practices and standards;
  4. Reach out to ancillary phenotype groups to share with them common concepts and practices;
  5. Educate the community about the methods for developing ontologies and their importance and utility in research.
The principals of the grant are soliciting participation in the network. To join, you can sign up here.

Saturday, November 27, 2010

The Statistical Core Vocabulary: Can Definitions Really be this Bad?

I just learned of the Statistical Core Vocabulary (abbreviated as SCOVO), the latest version of which one may find here. This vocabulary defines itself as: A vocabulary for representing statistical data on the Web.

It contains 3 classes and 5 properties. For a small artifact that aims to capture the core entities in a domain, one ought to expect that careful attention would be paid to the definitions of the classes and properties, on the part of its creator(s).

Clearly, that is not the case here.

The SCOVO definition of Dataset is: a statistical dataset. This is equivalent to defining subatomic particle as: a positively-charged subatomic particle.

Similarly, the SCOVO definition of Item is: a statistical data item. This definition is even worse, being the equivalent of defining book as: a red hardcover book.

Finally, the SCOVO definition of its third and final class, Dimension, is: a dimension of a statistical data item. This definition is the equivalent of defining length as the length of a rectangular wooden table.

The situation for the properties in SCOVO is no better. Two of the properties even have the same name as two of the classes: dimension and dataset. The dataset relation is intended to link an item to a dataset. The dataset relation is thus the equivalent of having a relation liver to link the liver itself to the organism of which it is part.

By contrast, the Information Artifact Ontology (IAO) has taken pains to define the terms data item and dataset well. Definitions of these terms, and other terms used in their definitions, follow:

Data item: an information content entity that is intended to be a truthful statement about something (modulo, e.g., measurement precision or other systematic errors) and is constructed/acquired by a method which reliably tends to produce (approximately) truthful statements.

Information content entity: an entity that is generically dependent on some artifact and stands in relation of aboutness to some entity.

Note that for the definition of generic dependence, the IAO relies on a formal, upper ontology called Basic Formal Ontology.

Data set: a data item that is an aggregate of other data items of the same type that have something in common. Averages and distributions can be determined for data sets.

In the IAO, data items stand in the "is about" relation to some entity. Thus, data are linked to the world that they are about.

SCOVO does have one thing going for it: it serves as an exemplar of the worst possible way to define terms.

Wednesday, September 1, 2010

Physician decries lack of definitions in informatics

Dick Stanley, MD, Chief Medical Information Officer (CMIO) at Cooley-Dickinson Hospital, Northampton, Mass, decries the lack of definitions in informatics. He states: Interestingly, I’m having trouble finding even basic definitions of very common terms...

Dr. Stanley correctly points out that the science of informatics requires good definitions of terms to make progress. He notes that ...we’re supposed to be the people who care about [definitions].

However, the field of biomedical informatics has long been delinquent in this regard. The definitions it does generate are often quite poor. Like many outsiders who are recent to the field of informatics, Dr. Stanley notes that ...the whole informatics industry suffers from a tremendous lack of definitions.

As a member of that "industry" for 15 years, Dr. Stanley, let me say, guilty as charged.

Fortunately, there is light on the horizon. Criticism in this regard from ontologists like Dr. Barry Smith and Dr. Werner Ceusters at the University at Buffalo is beginning to turn the tide. They have been advocating for high-quality definitions in ontologies and information models for years, culminating in the adoption at least 3 principles mandating definitions of high quality by the OBO Foundry.

Of course, many ontologists, typically those with an engineering and/or computer science background, are anxious to have rigorous logical axioms that define ontology terms. We caution that getting textual definitions for human understanding correct is a necessary first step toward the logical axioms they rightly desire for ontologies to support automated reasoning.

But we ought not to put any carts before the horse of textual definitions, be they the conduct of science, the application of science to the clinical realm, or the development of logical axioms to automate cognitive activities of humans.

Friday, July 30, 2010

LOINC Changes Frustrating Semantic Interoperability in Canada

According to minutes of the 2010-07-29 HL7 Vocabulary Working Group call, the new release of Logical Observation Identifiers, Names, and Codes (LOINC) is creating ambiguity about how to represent things like birth date. Apparently, it is now possible to represent birth date as an "observation", with a value that is a date. This representation conflicts with the "usual" way of representing birth date in the HL7 Clinical Document Architecture (CDA) standard, which represents it as a "demographic attribute of a person".

Really, the fundamental issue here is the failure to recognize that terminologies such as LOINC and SNOMED CT are representations developed in uncoordinated fashion with HL7's CDA, also a representation. Which is to say, that terminologies, information models, and ontologies (obviously) all make ontological commitments.

Conflicting views such as these are common. Indeed the whole TermInfo group of HL7 was created to reconcile conflicting representations (HL7 and SNOMED CT) to ensure (as much as possible) semantic interoperability.

Friday, April 9, 2010

Confusion reigns--and remains--about SNOMED CT licensing

On a Clinical and Translational Science Award (CTSA) Consortium call about data standards and interoperability, participants raised numerous questions about the allowable uses of SNOMED CT in the applications they are developing to facilitate translational science.

If I develop a web application that uses SNOMED CT codes under the covers, may I allow users in a country without a SNOMED CT license to access it?

If I develop a subset of SNOMED CT codes (sometimes also known as a "value set") to serve as the set of answers to an online survey question, can I publish this subset for others to use (thereby facilitating interoperability among surveys)?

By the end of the call, attendees had more questions than answers.

Wouldn't the open approach taken by the Open Biomedical Ontologies (OBO) Foundry serve the purpose of interoperability better? Shouldn't the United States redirect at least some of its investment in SNOMED CT into truly open standards?