With the steadily growing focus on developing treatment options that target specific subpopulations of patients (personalized medicine), the demand for a more granular molecular disease understanding has increased over the past years. This affects in particular the very early biological idea generation and target discovery/maturation in the pharmaceutical industry. To live up to the high expectations that the promise of more specific treatment options generates, and to add granularity to the molecular disease understanding, more and more layers of molecular data are generated simultaneously on each patient and tissue sample. The many data layers are typically transcriptomics, genomics, proteomics, metabolomics and microbiomics/metagenomics. Thecollection of these multiple data layers has led to the still evolving bioinformatics field of multi-omics or integromics, which seeks to build solutions for integrating the many data layers in a way that enables grouping patients into classes that otherwise would have not occurred by using only a single data layer.
As the amount of data and the dimensionality of it generates a level of complexity, that makes it difficult for scientists without some data science skills to interact with this kind of data, there is a general need for solutions that democratize these multi-omics data, and open the possibility for non-data scientist, to browse and interrogate the data. The simplest application is to test hypotheses through the ability to quickly query indication specific molecular in-house data and evaluate the validity or novelty of the described findings. Some proprietary solutions to accommodate these needs exist and have been around for some time.
However, while there has been a rapid evolvement of data science tools like shinythat enable bioinformaticians to make data available interactively, there doesn’t seem to be a strong consensus in the industry whether to solely rely on proprietary solutions or in-house build custom solutions (buy versus build). We believe that the solution is to be found somewhere in between, buy some of the parts, and customize in-house where it adds significant value to the business.
It is absolutely crucial to ensure proper education of drug discovery scientists to create a solid understanding of the limitations and pitfalls, when making complex data broadly available within an organization
One important point needs to be made here, whichever solution an organization decides on, with data democratization comes a risk of over- or mis-interpretation. An example could be, that a researcher finds a protein enriched in a histological analysis reported ina publication; but the internal data platform does not show the mRNA as being significantly upregulated in the relevant tissue. In this case it might be, that the high-throughput transcriptomics technology presented on the data platform, was simply not sensitive enough to detect this lowly expressed gene, but not being aware of this could lead to unfortunate misinterpretation. So, it is absolutely crucial to ensure proper education of drug discovery scientists to create a solid understanding of the limitations and pitfalls, when making complex data broadly available within an organization.
David Adrian Ewald ,Head of Bioinformatics, Leo Pharma
To conclude, we have seen a great evolvement in terms of the pricing and throughput of omics data generation in the past decades, which has led (and still is) to a vast amount of data in many areas of research. Huge amounts of multi-layered data give us the opportunity to add granularity to our molecular disease understanding, but we have to always remember, as Christine l. Borgman eluted to in her book “Big Data, Little Data, No Data”, that even though “big data” is great to have, we need to be even more focused on the quality of the data.