How to Nurture a Biotech? Sometimes, Like a Teenager
How Inpatient Care Feels Today
Navigating the Big Data Explosion in Biomedical Sciences
Best Practices of Clinical Bioinformatics and Data Science
History, Challenges, and Future Directions of Bioinformatics in...
Yu Liang, Ph.D., Director of Clinical Biomarkers, Calithera Biosciences
Bioinformatics in Pharma and Biotech Industries: Challenges and...
Brandon W. Higgs, PhD, Head of Translational Bioinformatics, Immunocore & Adjunct Faculty, Johns Hopkins University, Bioinformatics & Biotechnology AAP
Thank you for Subscribing to CIO Applications Weekly Brief
The Rise of Population Genomics in Target Discovery
By Irene Blat, Ph.D., Scientific Director of Translational Genomics, Wuxi NextCODE Genomics
Increased affordability aside, the surge in WGS is also attributed to the potential for discovering novel drug targets that can better treat the root cause of a condition rather than the symptoms. WGS enables the detection of rare variants with large effects that can provide greater insight into phenotypic variation of complex diseases, such as Non-Alcoholic Steato Hepatitis (NASH), Multiple Sclerosis (MS) and Heart Failure (HF). To identify disease-specific variants present in <1% of the population, a large, homogeneous, well-characterized population is needed. For example, Genomics Medicine Ireland plans to whole-genome sequence 400,000 Irish. This data set will contain disease-specific cohorts with thousands of patients and matched controls, designed with statistical power to identify rare variants with large biological effect. These findings can lead to novel drug targets and influence personalized medicine by predicting an individual’s risk of developing a particular condition.
To elucidate novel targets, analysis begins by stratifying the patient cohorts based on disease severity or recurrence of disease and then comparing the genetic profile of patients within these classes. With a well-powered cohort, a researcher can identify statistically significant variations. Depending on the expected frequency and effect of the variant, an additional approach to increase the power of the study may be applied to collapse rare variants onto a region such as a gene. This type of analysis measures the burden of a collection of rare variants in a region. Regardless of the approach, one challenge is how to prioritize the list of statistically significant variants or genes? Since not all variants are equal in their effect for a given condition, different scoring methods exist to take into account a variant’s functional impact, conservation, known disease impact and much more. For example, WuXiNext CODE applied deep learning approaches to develop the deep CODE score that integrates numerous factors and scores that contribute to the impact of a particular variant. This is then applied to rank variants that emerge as significant, or to weight variants that are being assessed as part of a gene burden test. From this filtered list of targets, potential drug targets may emerge and/or a collection of variants can be used to generate a polygenic risk score to better predict the risk of developing a particular condition.
Increased affordability aside, the surge in WGS is also attributed to the potential for discovering novel drug targets that can better treat the root cause of a condition rather than the symptoms
Work does not stop once data is generated, the ability to store and access these massive data sets it in real-time is challenging. Best practice is to store in an easily retrievable fashion to allow for quick lookup of variants or genes of interest within the data set. For example, the de CODE project relied on a Genomically Ordered Relational database (GORdb) to be able to store and rapidly access genomic data for 300,000 Icelanders. This system allows for quick identification of samples that carry a rare variation of interest or even extend to carriers of rare variants across a gene of interest.
To confirm variants of interest, genomic studies may be supported by whole transcriptome sequencing to confirm the expression of a particular variant allele or gene. Additionally, epigenomic, proteomic and metabolomic data can further provide insight into the complete picture of the milieu of factors contributing to a disease state. With increasingly complex data sets the ability to integrate these different types of omics becomes more difficult. These challenges can be addressed with sophisticated models that apply deep learning approaches to tease apart the underlying biology of a particular disease state and ultimately provide insight into new druggable pathways.