Scientific workflows – the knowledge-generating engines of R&D.

Scientific workflows – the knowledge-generating engines of R&D.

Scientific research requires two kinds of effort. One is the generation and synthesis of original ideas by skilled practitioners. This is a desirable and often lauded talent that can spawn remarkable innovations in science and medical care. The second kind of effort is less visible, but equally important—the hard work required to turn an idea into reality. Executing the experiments, analyzing the data, and developing presentations of results are examples of this work. Although these latter efforts are necessary, and even enjoyable, they nonetheless can be tedious, time-consuming, and expensive.

In model-based research and development (MBR&D), turning an idea into reality includes creating analysis-ready datasets, managing analysis processes, and preparing work products, such as tables and graphs. The idea of automating these and the other tasks required for MBR&D may seem preposterous. The fact is that some companies have already made strides in automating various recurring tasks, such as creating graphs and managing the results from statistical analyses. Other tasks still require a scientist’s careful scrutiny and judgment, however. To achieve the degree of rigor required to adequately support these more complicated tasks, a more sophisticated and higher order of automation is necessary. The challenge is huge, but I think automation is overdue in pharmacometrics—automation is needed so that we can achieve the productivity gains we desperately need in the Pharma of the Future?.

Over the last few years, my colleagues and I have been working to understand the obstacles that must be addressed in order to improve productivity in the pharmacometrics enterprise—to industrialize it, if you will. A major obstacle we had to address early on was the belief that automation is the first step in an undesirable transition to hands-off, machine-dominated science. In fact, our efforts at industrialization have led to remarkable improvements in the consistency and quality of our work products and to a greater emphasis on thinking instead of doing.

Specifically, formalization and automation are required for what is now a largely hands-on, ad hoc process. The manual approach is problematic for several reasons. First, creating analysis-ready datasets, managing analysis processes, and preparing work products are time-consuming and error-prone when done manually—and as the size of datasets and the complexity of models increase, these tasks may become impossible to do without automation. Second, archiving each of the steps in the modeling and simulation process, including both high-level scientific results and low-level details about the hardware and software environment, is tedious. Without this information, however, it is difficult to re-generate the results. Reproducibility of results is a cornerstone of the scientific method—the scientific soundness and integrity of the computational efforts at the center of MBR&D will be questioned without the ability to reproduce results.

Workflows have recently emerged as a paradigm for representing and managing complex distributed scientific computational processes. A workflow is a computerized facilitation or automation of a business process, in whole or part. Scientific workflows capture individual data transformations and analysis steps in addition to the mechanisms to carry them out in a distributed environment. Each step in the workflow specifies a process or computation to be executed (for example, a software program to be executed, a web service to be invoked, datasets to be assembled, and exploratory graphical displays to be generated). The steps are linked according to dependencies among the data and workflow tasks. The schematic for these workflows must contain the many details required to carry out each analysis step, including the use of specific execution and storage resources in distributed environments.

The details of computational processes captured within workflow systems can be exploited to automate the execution of steps in the workflow. Workflows can also provide the provenance information necessary for scientific reproducibility, publication of results, and sharing among collaborators. By providing formalism and supporting automation, workflows have the potential to accelerate and transform the modeling and simulation efforts required for MBR&D.

Much research is currently underway in the cyberinfrastructure community to address issues of creation, reuse, provenance tracking, performance optimization, and reliability of workflows.* To fully realize the promise of workflow technologies in MBR&D, many additional requirements and challenges must be met. Workflows must be able to track the evolution of models from their origins in early discovery to the mature and comprehensive models used for clinical trial simulations, for design of clinical trials, and for making informed predictions of clinical and commercial performance. Workflows must be able to support dynamic, event-driven analyses, handle streaming data, accommodate interaction with users, give intelligent assistance, and allow collaborative support for workflow design, to enable sharing of results across functional areas.

Meeting the workflow challenge head-on has a benefit that is more important than just speeding up the “doing”. A robust and useful workflow can only be conceived and implemented with the explicit cooperation of the many functional areas involved in generating the requisite data. The interdisciplinary collaboration needed to specify workflow requirements, to achieve consensus on study methods, and to detail how to interpret and present results is sorely missing in many companies. This collaboration is surely the most critical step for improving the productivity of the pharmaceutical research and development enterprise.

Be sure read the next Pharma of the Future? blog entry, Why don’t computer systems help me as much as I think they should? If you missed the last posting, click on over to: Productivity is not a four-letter word.

*  National Science Foundation Cyberinfrastructure Council. Cyberinfrastructure Vision for 21st Century Discovery. Arlington, VA: National Science Foundation; 2007. NSF publication 07-28. https://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf. Accessed April 7, 2011.