Lambda Architecture-Ideal for Healthcare Big Data Use Cases
By Shahid Shah, Co Founder and CEO, Netspective Communications
The Affordable Care Act (ACA), Medicare Access & CHIP Reauthorization Act of 2015 (MACRA), Merit-Based Incentive Payment System (MIPS), Alternative Payment Models (APMs), Precision Medicine Initiative (PMI), and Patient-Centered Outcomes Research through PCORI are all taking us towards a more value-driven payment system for the U.S. healthcare system. Physicians and hospitals have been, for decades, paid fees for services they perform on patients and the higher their volume the more money they made (regardless of outcomes). Given the unsustainable growth rates in national healthcare spending, all health insurers and the federal government are working to figure out how to pay providers and health systems for the value they deliver to patients and the public health system.
ACA, MACRA, MIPS, APMs, PMI, PCORI, and the many other initiatives the healthcare industry has embarked upon all have an insatiable appetite for data. Unfortunately, existing data architectures built on analytical data marts and data warehouses are starting to prove insufficient when asked to handle complex next generation value-based business models which require more collaborative and flexible data processing.
Why Data Marts and Warehouses are Insufficient
Today’s data infrastructure was built for a world in which health providers get paid for almost any services applied to almost any patient without regard to outcomes. Data marts and warehouses (DMW) are great for pre-structured and pre-processed data sources that are coming from a small, relatively fixed or slowly changing, number of transactional systems. The assumption that we know our sources of data and their formats along with their business rules is often called early-binding. Early-bound data is quite useful for use cases where the data sources, structures and associated business rules don’t change very often.
Since DMWs have pre-defined data structures, their database schemas are set before data is written into the database. This is often referred to as schema on write because the type, format, and rules for data is known in advance, when the data is stored in the database. Once a data warehouse is created and data is written into it, its ability to change formats, types, or most other attributes is limited at best.
CIOs should consider morphing data ware houses into data lakes using Lambda Architectures
Much of the time spent in managing DMWs is in the extract, transform, and load (ETL) process and then once that process is in place then the analysts using the data in the warehouses are stuck with the dimensions created by the ETL process. Any kind of reporting or investigation of data that changes the format, style, units of measure, etc. would require going through the ETL process again. This is why DMWs are considered enterprise-grade, usually have a high cost to setup, and a high cost to maintain.
Since the technologies, architectures, approaches, and designs for DMWs are well understood, they are good for use by talented and experienced business analysts looking to perform well defined, but seldom changing, business processes that produce retrospective analytics and reports. However, traditional DMWs cause long term maintenance and user challenges when data scientists need to do ad hoc or exploratory data discovery which is a necessity for true value-based payment models. DMWs are great when you know what questions to ask of your data; they are not so good when you want questions to emerge from the data itself–which is what machine learning and artificial intelligence initiatives require.
Why we Need Lambda Architecture, Data Streams, and Data Lakes
Healthcare data, like in so many other industries, is now more unstructured and varied than ever. From socio-economic data to genomics to imaging and telemedicine, traditional data warehouses cannot handle the variety, volume, or velocity of data coming into many business units and departments within a single institution, let alone between multiple institutions that might be partners or competitors.
DMW initiatives need to morph by learning how Lambda Architectures work. Next generation data streams (different sources of data) pour into data lakes(similar to a warehouse but not as structured or pre-processed). Instead of being pre-structured or pre-defined, data isusually kept in its original format and can then be converted on the fly when being read from the database. This is called late-binding data. Late-bound data is typically formatted, cleansed, or otherwise processed upon reading of data and this is sometimes referred to as “schema on read”. While this seems like it would be time-consuming or slow, modern BigData systems can process and analyze data at pretty high speeds.
Instead of the typical DMW driven ETL (extract-transform-load) process that is time-consuming and expensive, Lambda Architectures employ the ELT (extract-load-transform) approach. By having data extracted from source locations and then immediately loaded into highly efficient and flexible staging and pre-processing areas, transformations of data can occur when necessary and change more easily when analysts and data scientists discover new requirements. The cost of storage is the same but the agility in building applications, doing data science, or even traditional reporting is often significantly improved.
How CIOs should incorporate Lambda Architectures into their data integration roadmaps
Replacing data marts and warehouses with Lambda Architectures immediately is neither simple nor fast. However, it’s not too hard to build a go-forward strategy that accommodates current data infrastructures while migrating to a more flexible one. CIOs shouldn’t focus on replacing their existing architecture – the Lambda Architecture can help form a “meta architecture” that incorporates their marts and warehouses initially and then over time those DMWs can be phased out when more lakes are created. As shown in Figure 4, there’s a place for existing transactional systems, existing warehouses, and existing approaches to be used while a transition is taking place.
Netspective has a healthcare informatics reference platform, known as Medigy, which can be used to help pave the way. If you’re looking for a way to use open source technologies and embrace Lambda Architecture in an evolutionary approach which won’t break your roadmap or your budget, check out Medigy.