July 2020CIOAPPLICATIONS.COM9Sclerosis (MS) and Heart Failure (HF). To identify disease-specific variants present in <1 percent of the population, a large, homogeneous, well-characterized population is needed. For example, Genomics Medicine Ireland plans to whole genome sequence 400,000 Irish. This data set will contain disease-specific cohorts with thousands of patients and matched controls,designed with statistical power to identifyrare variants with large biological effect. These findings can lead to novel drug targets and influence personalized medicine by predicting an individual's risk of developing a particular condition. For instance, in NASH the existing therapeutics target fibrosis which is a symptom of the disease. By analyzing large disease cohorts that includea collection of patients at different stages of progression, the contributing factors of this disease may be identified by comparing late stage to early stage. This is only possible when the data sets are deeply pheno typed to allow for characterization of disease pathways that link the phenotypic variation to the complex genetic landscape. To elucidate novel targets, analysis begins by stratifying the patient cohorts based on disease severity or recurrence of disease and then comparing the genetic profile of patients within these classes. With a well-powered cohort, a researcher can identify statistically significant variations. Depending on the expected frequency and effect of the variant, an additional approach to increase the power of the study may be applied to collapse rare variants onto a region such as a gene. This type of analysis measures the burden of a collection of rare variants in a region. Regardless of the approach, one challenge is how to prioritize the list of statistically significant variants or genes? Since not all variants are equal in their effect for a given condition, different scoring methods exist to take into account a variant's functional impact, conservation, known disease impact and much more. For example, WuXiNext CODE applied deep learning approaches to develop the deep CODE score that integrates numerous factors and scores that contribute to the impact of a particular variant. This is then applied to rank variants that emerge as significant, or to weight variants that are being assessed as part of a gene burden test. From this filtered list of targets, potential drug targets may emerge and/or a collection of variants can be used to generate a polygenic risk score to better predict the risk of developing a particular condition.Work does not stop once data is generated, the ability to store and access these massive data sets it in real time is challenging. Best practice is to store in an easily retrievable fashion to allow for quick lookup of variants or genes of interest within the data set. For example, the deCODE project relied on a Genomically Ordered Relational database (GORdb) to be able to store and rapidly access genomic data for 300,000 Icelanders. This system allows for quick identification of samples that carry a rare variation of interest or even extend to carriers of rare variants across a gene of interest. To confirm variants of interest, genomic studies may be supported by whole transcriptome sequencingto confirm the expression of a particular variant allele or gene. Additionally, epigenomic, proteomic and metabolomic data can further provide insight into the complete picture of the milieu of factors contributing to a disease state. With increasingly complex data sets the ability to integrate these different types of omics becomes more difficult. These challenges can be addressed with sophisticated models that apply deep learning approaches to tease apart the underlying biology of a particular disease state and ultimately provide insight into new druggable pathways. Irene BlatIncreased affordability aside, the surge in WGS is also attributed to the potential for discovering novel drug targets that can better treat the root cause of a condition rather than the symptoms
< Page 8 | Page 10 >