COVID 19 Data Commons and Analytic Environment
The Renaissance School of Medicine and the College of Engineering and Applied Sciences
have developed a COVID 19 data commons that will support integrated management, query and analysis of clinical, radiology, pathology,
spatial and molecular data. The clinical data captures all available information,
in a HIPAA compliant fashion, about COVID 19 patient symptoms, past medical history,
family history, clinical course, treatment and response as well as data elements relating
patient demographics and co-morbidities. The radiology data includes all imaging studies
obtained during each patient’s treatment including CT and chest x-ray data along with
computationally derived data products. Radiology imaging data is extremely important
in COVID 19 from both a diagnostic and a monitoring perspective, given the crucial
nature of COVID 19 pulmonary disease and its rapid phenotypic changes. COVID 19 can
also impact other organs, so we include all radiological studies. Pathological whole
slide imaging data is being generated from COVID 19 patient biopsies and autopsies.
We anticipate that the data commons will help guide molecular studies, and so information
on clinically validated and experimental biomarkers, and both host and viral genomic,
epigenetic, and transcriptomic data will be incorporated as it becomes available.
The integrated use of molecular and imaging methods to understand host/pathogen interaction
is particularly important in COVID 19. The spatial information will encompass home
and work addresses of patients who have been under investigation for having COVID
19 whether they test positive or not. The spatial information will also include patients with positive or negative antibody
tests. Notably, the data commons will include the ability to query and analyze spatial patient
information while maintaining patient privacy.
The data commons will make these datasets available to all researchers from contributing organizations, to help guide best clinical practices in complex clinical situations, support analytic pipelines designed to discover and evaluate biomarkers to predict clinical course and treatment response, as well as pipelines that can predict potential outbreaks and steer prevention efforts. Effective approaches to COVID are likely to require coordination and integration of multiple approaches and multiple data sources. The data commons will manage and make available results from the above-mentioned analytic pipelines to the research and clinical communities.
Stony Brook Medicine is actively engaged in leveraging the data commons to help lead a variety of broader national and international COVID 19 clinical and research efforts. We are using the data commons to develop a variety of molecular, imaging and geospatial computational modeling pipelines. In collaboration between the Renaissance School of Medicine, the College of Engineering and the Institute for Engineering-Driven Medicine, we have launched a variety of statistical and artificial intelligence-based projects to predict outcome, progression and response to treatment in our patient population. This effort leverages the highly productive research efforts already underway in medical applications of machine learning and artificial intelligence. These efforts make integrated use of clinical, imaging, molecular and spatial informatics information.