Square Pegs in Round Holes?

Square Pegs in Round Holes?

A crusading scientist identifies a potential public health threat and uses the internet to get access to a cache of data from several studies. After a quick analysis of the pooled data, he reports a previously unrecognized adverse effect of a widely used drug. Patients and physicians become alarmed, and the drug is pulled from the market. Sound like the plot of a new medical thriller? In fact, a similar scenario has hit the headlines several times in the last 10 years. Each time, controversy about the validity of the analysis and conclusions was loud and lasting.

The idea of analyzing data combined from multiple clinical trials (that is, meta-analysis) is an attractive strategy for monitoring and assuring drug safety. While each clinical trial is designed to answer a specific question (or questions) in a specific patient population, a meta-analysis of the pooled data from multiple clinical trials potentially provides a more complete picture of the risk-to-benefit trade offs. A critical consideration in the performance of a meta-analysis involves the choice of studies to pool. Logistical factors that play importantly into the selection of trials include the study designs, patient populations, and outcome metric captured in each trial. Thus, a successful meta-analysis requires the availability of comprehensive information about the types of patients enrolled in clinical trials, such as information on demographic characteristics, disease severity, treatment regimens, and use of concomitant medications among other factors.

The informatics challenges to performing a successful meta-analysis are some of the key driving forces for the pursuit of semantic interoperability and the development of data standards by organizations such as the Clinical Data Interchange Standards Consortium (CDISC). The efforts directed at data standardization come at an important time. The cost of drug development has soared in recent years, and challenges regarding drug safety have drawn the scrutiny of Congress.

Data standardization, as it is currently being practiced, involves bringing a group of experts together to share their experiences and personal perspectives with respect to specific concepts of interest. While this exercise is valuable in exploring nuances, a problem arises when the group moves to develop a standard definition. Often, instead of retaining the rich granularity revealed during the discussions of the concepts, the group moves to achieve consensus by developing a definition that satisfies a majority of the experts.

Imagine a group of experts called together to develop a definition for “happy.” Individuals drawing upon their recent experiences might describe feelings such as glad, content, cheerful, joyful, beaming, ecstatic, jubilant, and rapturous. The consensus definition (in this case drawn from the Oxford English Dictionary, 9th edition) might be “feeling or showing pleasure or contentment.” Unfortunately, the granularity that gave Shakespeare the tools to represent the human condition is lost in the consensus-forming process. So while current efforts at data standardization ensure that the primary statistical calculations for a study can be replicated, the loss of granularity reduces the ability to represent nuances that can be essential for the interpretation of future meta-analyses.

The Problem of Premature Standards
The desire of medical researchers to achieve the promise of semantic interoperability has created a sense of urgency for the development and deployment of data standards. This urgency provides the justification for distributing early versions of a standard, with the idea that the early versions will be improved in subsequent releases.

This rush to implement a standard has two important consequences. First, late adopters have a reason for holding back from implementation because of instability with the standard. Second, and perhaps more importantly, the early adopters are forced to use what is available — resulting in the emergence of different dialects in the accomplishment of tasks not anticipated by the initial version. This need to pound round pegs into square holes creates an obstacle in the pursuit of semantic interoperability because it is difficult to rectify these issues once a premature standard has come into widespread use.

New Strategies
The goal of semantic interoperability, which includes the goal of facilitating analyses across trials for drug safety assessments, will require several changes to the current strategy of data standardization. First, the short-term goal of data standardization must shift from a focus on promulgating standards to an emphasis on unraveling the meanings behind complex concepts. Second, the output of this process must then be encoded in a scientific ontology built on standard formats and methodologies for ontology development, maintenance, and use in order to foster the creation of principled ontologies. (Additional information can be found at the website for The Open Biomedical Ontologies.)

Semantic interoperability may very well remain elusive for the foreseeable future. One approach to incrementally achieve this goal might be to adopt a short-term focus on developing a strategy to learn about ambiguities sooner so that we can get to a higher level of semantic interoperability faster. This process, known to informaticians as disambiguation, involves the unraveling of complexities that are often implicitly represented in a particular data standard term.

A growing number of ontologies are being created to address various scientific domains. Of particular importance to the complex data standardization efforts in the biomedical sciences is the implementation of a curation effort. This effort aims to consolidate the terms generated from disparate ontologies in order to ensure their reusability, and to ensure compatibility between neighboring ontologies.

This effort has been a critically valuable component in the development of the gene ontology for organizing and mining newly elucidating genomic information. New approaches to drug development must evolve if we are to see continued improvement in research productivity and drug safety. The move towards scientific ontologies as a basis for developing data standards is one approach to preparing for these changes and allowing for the evolution of the informatics backbone for the pharmaceutical and biotechnology industry.

Got you hooked? Be sure to read the next post in the Pharma of the Future? blog: An Uncommon Vignette? And don’t forget to peruse the previous post: Kerfuffle!

This article was published in a slightly different form in Bio IT World on September 13, 20