Youping Deng, PhD
Professor, Director of the Bioinformatics Core
Quantitative Health Sciences
John A. Burns School of Medicine
Yiqiang Zhang, PhD
Assistant Professor
Anatomy, Biochemistry & Physiology and Center for Cardiovascular Research
John A. Burns School of Medicine
Processing Multiomic Datasets for Improved AI/ML-readiness in Congenital Heart Disease Research
Project Summary: This Supplements application is submitted in response to the NOSI focused on improving the AI/ML-readiness of NIH-supported data. Background: The overarching goals of the parent RCMI U54 grant Center (U54MD007601) entitled “Ola HAWAII” are to lead and advance minority health and health disparities research in Hawaii and beyond. The U54 Center Ola HAWAII projects aim to enhance institutional capacity to facilitate basic biomedical, clinical and behavioral research (Aim 1); address health disparities and health-related concerns of underserved communities (Aim 2); mentor and support a diversified health disparities research workforce (Aim 3); and to enhance the quality and productivity of health disparities and health-related research through world-class research facilities and services (Aim 4). This Supplements project specifically targets cardiovascular disease (CVD), which is not only the leading cause of death globally, it also disproportionately affects underrepresented minority populations, including Native Hawaiian and other Pacific Islanders (NHOPI). Congenital heart defects/diseases (CHDs) are a major cause of premature death or lifelong disability, afflict ~1% of live birth on average, and are significantly more common in Hawaii. In addition, different forms of CHDs have varied rates across races/ethnicities. With the increasing health records and various genomics data now available to the public, it is possible to perform a comprehensive analysis on the genetic traits of CHDs. The scientific and public health values of big biomedical data can be boosted by data-driven technologies, such as artificial intelligence and machine learning (AI/ML). Overall goals and aims: With this collaborative initiative on data sciences and cardiovascular research, we propose to improve NIH-supported datasets for AI/ML applications, including those from the Pediatric Cardiac Genetics Consortium (PCGC) and Gabriella Miller Kids First programs. This proposal will address the critical areas in AI/ML-readiness of complex phenotype-omics big data usage, making them FAIR (Findable, Accessible, Interoperable, and Reusable). First, we will perform sequencing data cleaning and imputation, normalization, feature annotation and extraction, to produce combined datasets allowing for AI/ML-ready investigation into all available PCGC/Kids First CHD samples. We will curate the data to allow for direct and broad AI/ML-powered investigation on CHD. Then, we provide an use case integrating the reprocessed and synthesized datasets of CHD phenotypic and molecular profiles, and perform supervised genetic classification of cardiac defects with different ML methods. This will lead to new genetic classifications of multifaced, mild to severe CHDs. Impacts: The processed data will be available to the scientific community, allowing for broader AI/ML-powered data science and biomedical and clinical investigation. The new CHD genetic classifications will provide novel therapeutic targets. And the parent U54 program will benefit from these datasets; the new AI/ML-based discoveries of CHD genetic classifications will be a crucial reference for our research and training efforts to advance minority health and health disparities in CVD.