EDA

Poster: Data Exploration Libraries in KIWI

Abstract

Data exploration libraries in KIWI

Sebastien Bihorel, David Fox, Pranay Kumar, Saqib Samuel, Sebastian Manunta, Andrew Rokitka, Cindy Walawander, Ted Grasela

Cognigen Corporation, a Simulations Plus Company

Objectives: Data exploration is critical to the success and efficiency of pharmacometric analyses by allowing the identification of key data patterns, guiding the modeling strategy, and supporting regulatory submissions. Creation of meaningful displays can be time-consuming and typically requires strong programming skills. The Explore and ExploreLive modules of the cloud-based KIWI platform[1] were created to automate the creation of data exploration plots and tables and designed to optimize the ease of use, quality of display, flexibility, interactivity, reproducibility, and traceability.

Methods: Both modules are graphical interfaces which fully integrate with existing KIWI modules and can simultaneously explore multiple datasets and output tables. All graphical and tabular displays are created with R.[2] Explore is a web module intended for the creation and permanent storage of report-quality tables and lattice-based plots.[3] R scripts are automatically generated and executed using the profile of settings selected in the graphical interface. All outputs are stored in a database and tied to a unique dataset identifier assuring traceability and reproducibility. Upon completion, displays are rendered within the Explore module. ExploreLive is a Shiny application intended for the live and non-permanent creation of tables and interactive ggplot2-based plots, and the execution of small-scale data analysis.[4,5] The ExploreLive application is remotely hosted and rendered in the user web browser. All displays are generated on the server side based upon the user-selected options and rendered in the application.

Results: Explore offers a widely customizable library of exploratory displays, including summary statistics tables, scatter and line plots, barcharts, boxplots, histograms, and pairwise matrix plots. For each type of display, users can create multiple profiles of settings that they can re-use across all projects initiated in the shared user environment. Custom settings are numerous and include data subsetting, variable selection, data stratification, layout, axis settings, summary statistics selection, etc. Execution of these custom profiles typically generates multiple displays which are automatically organized in a hierarchical tree format for convenient navigation and comparison across multiple data sources. Explore leverages the graphical standards and data formatting features established for model diagnostic plotting in the Visualize module, offering seamless visual integration in technical reports. ExploreLive is intended as a sandbox environment. It allows users to modify, merge, and subset data sources, create a large variety of interactive plots and tables, and perform linear regression, data binning, and survival analysis. Similar to Explore, the ExploreLive module offers a large number of custom options providing high interactivity for data manipulation and exploration. The Shiny architecture implements swift reactivity between user input changes and the update of the display rendered on screen. Interactive features allow users to click on plots and obtain relevant information about the data source.

Conclusions: Explore and ExploreLive naturally extend the functionality of the KIWI platform and provide powerful and user-friendly tools for data exploration. The library of displays provided by Explore enables busy scientists and those with no programming skills to quickly generate meaningful tables and plots to support their pharmacometric needs. By leveraging the infrastructure and features previously established, Explore maintains consistency and reproducibility in the design of tables and plots across projects and between exploratory and model-diagnostic displays. ExploreLive offers a convenient environment for quick data exploration and investigation of data patterns.

References:
[1] Bihorel S, et al. KIWI: a collaborative platform for modeling and simulation. PAGE 23 (2014) Abstr 3124 [www.page-meeting.org/?abstract=3124] [2] R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
[3] Sarkar, Deepayan (2008) Lattice: Multivariate Data Visualization with R. Springer, New York. ISBN 978-0-387-75968-5
[4] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009
[5] Winston Chang, et al. shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny

EDA Poster Presented at PAGE 2018