Leveraging Biomedical Big Data: A Hybrid Solution
By Bryon Campbell, Ph.D., CIO, Van Andel Institute
Big data can be a lifesaver…literally. In fact, the efficient handling and analysis of large, complex datasets in biomedical research plays an integral role in developing new ways to prevent, diagnose and treat diseases.
Scientists engage in a wide range of data-intensive research projects using high-resolution imaging, genomic sequencing instruments and molecular modeling simulations to detail processes such as gene expression and protein behavior.
Because the research conducted in just one laboratory can produce billions of data points, and research techniques evolve at a rapid pace, it is increasingly important for research facilities to architect solutions that can scale as requirements change.
At Van Andel Institute (VAI)—a nonprofit biomedical research and science education organization in Grand Rapids, Michigan— we have managed these big data challenges by embracing cloud computing and implementing a hybrid OpenStack high performance computing (HPC) system. This new infrastructure significantly improves our IT flexibility, while providing users cutting-edge computational resources. The solution saved us roughly two years of development time.
Anticipating Technological Change
The Institute is home to 28 principal investigators and their laboratories that study epigenetics, cancer and neurodegenerative diseases such as Parkinson’s, and are dedicated to translating those findings into effective therapies.
As VAI has increased collaboration with other research institutions and large scale bioinformatics projects around the world, scientific investigations have become much more complex. In recent years, this has led to the formation of research groups that require the ability to work on terabyte and petabyte scale data projects. In addition to having the storage and CPU capability to process and analyze scientific big data, we also wanted a creative approach to future-proofing our inevitable need for more computational resources.
In 2014, we realized that the science at the Institute was driving the need for exponentially higher volume, higher-speed computational resources. We knew that cloud-based, high-performance computing would soon be the new standard. And although the big players in cloud computing had lowered their prices in recent years, we needed to have a computing solution in-house that gave our scientists direct access to higher speeds.
It is thrilling to watch highly efficient computing accelerate scientists’ ability to determine errors in cellular processes that lead to diseases
The continual on boarding of big data-dependent scientists with very diverse system requirements and aggressive timelines meant that we had to explore alternative ways to deliver computing resources to our users.
VAI’s relatively small size and the fact that there would be no legacy equipment to work around made us agile enough to consider a hybrid system with the flexibility to work locally and virtually. In early 2015, our team began implementing a HPC hybrid system that would include three key components—Bright Computing Cluster Manager with OpenStack software; 43 compute nodes, representing 1100 CPU cores, provided by Silicon Mechanics; and parallel (GPFS) storage supplied by Data Direct Networks.
The new system needed to be implemented within a few months—a timeline that would be unreasonable for most large universities or big businesses looking to accomplish the same type of transition.
We are talking about the difference between turning a cruise ship and a speed boat. Although larger organizations would benefit from new approaches, they often are slowed down by established processes and existing equipment. The Institute is very nimble—our structure allows us to transition quickly with no major engineering changes.
A Smooth Implementation
The hybrid HPC cluster and private cloud went live in September 2015, with very few changes from the initial plan to the final implementation. A near flawless execution was important because even small issue could mean the delay of important research.
Because VAI scientists expect future research to be even more data intensive, the system was designed and built with the flexibility to easily bolt on additional resources.
Cloud-based users and cluster-based users at the Institute are now operating simultaneously in a hardware environment that allows for fast access to very large data sets. Administrators also have clear visibility of the Institute’s local cloud and can easily fine-tune the user mix as needed.
We made strategic decisions when executing this HPC hybrid system in order to keep computational accessibility at the forefront. Although others use public cloud providers, a local solution was the best choice for us because of our high-volume instruments and exabyte-scale inter-node traffic. With our current hybrid approach, we enjoy the benefits of local infrastructure while still having the flexibility and ease of use that cloud computing provides.
Big Data Making a Big Impact
This HPC solution is accelerating research at the Institute. VAI scientists are able to analyze data in new ways and expedite the process of transforming hypotheses into advances in medicine that can ultimately save lives.
The system also allows research teams to do more calculated work by giving them the time and ability to cross validate data. Thoroughness and precision in completing data analysis, in turn, facilitates more accurate laboratory testing.
Because we’re not paying per hour or based on frequency of access for computing power, our scientists have more freedom to test and explore biological systems more thoroughly and investigate hypotheses in more efficient ways. It is thrilling to watch highly efficient computing accelerate scientists’ ability to determine errors in cellular processes that lead to diseases.