Considerations before Introducing Big-Data Solutions in Your SAP Landscape
By Ankit Batra, Advisory Enterprise Solutions Director, KPMG
This situation is no different for organizations running SAP as their Enterprise Resource Planning (ERP) platforms and a system of record for their business transactions. The constant need to drive higher levels of efficiency is requiring business to seek ways to optimize existing business and operational process as well as extending the potential of using data-driven results into every facet of an organization.
The analytics roles within an organization are also changing, keeping pace with the big-data technological advancements. The role of a data scientist, who uses statistics and machine learning approaches to identify patterns and recognize meaningful insights, is being introduced at organizations that are starting to leverage big data processing for historical and forecasting view development. An IT landscape that houses an ERP solution like SAP ECC and big data solutions like Apache Hadoop, HP Vertica, Greenplum, Google BigQuery, and others, would demand separate roles to manage and run each of the underlying application.
For example, while a business analyst would create reports leveraging structured transactional data generated in SAP, a data scientist would create and visualize data mining scenarios leveraging data of all formats (including semi- and unstructured)stored in the data lake or big data application.
Enterprises that rely on SAP for business process execution and wish to introduce big data ecosystem in their landscape should consider the following aspects at the onset of their journey.
Platform and Infrastructure
As the need to have a highly robust and scalable platform is central to any big data project, there are multiple vendors offering highly distributed architectures and processing solutions to meet the growing needs. The big-data platform space is filled with both open and closed source solutions running on either cloud or on premise infrastructure. One of the solutions leading the big-data revolution is Apache Hadoop, an open-source data-processing platform first used by Internet giants such as Yahoo and Facebook.
Existing SAP clients need to define guidelines for data ownership and privacy before pooling and analyzing data generated within or outside the organization
The fundamental principle behind Hadoop is to slice and distribute the data across multiple server nodes and, subsequently, process the data slices concurrently, as opposed to processing one massive block of data all in one go. Applications using Hadoop continue to run even if any of the individual cluster or server fails owing to the stable nature and underlying data replication.
A multitude of tools and libraries are available on the platform that offers functionality such as in-memory processing, interactive queries, data lineage or SQL-like querying on data. One needs to acknowledge that data integration between SAP solutions running either on SAP HANA or any other databases can be achieved via a wide variety of tools and integration platforms. Examples of data integration between SAP and big-data platform include merging sales data from SAP with sentiment data captured from external Internet sources to design a marketing approach, or scheduling a preventive maintenance order in SAP based on sensor and IoT data collected from production-floor devices.
Governance and Security
The governance and security risk stakes are extremely high when considering big data landscape for your business, simply because of the sheer volume and variety of data being stored and analyzed. It is also crucial to understand that despite of huge amount of data being pooled, not all the data (structure/unstructured) will hold the same significance when considering the access rights and security.
Designing an efficient and effective data management framework to regulate/monitor the need to access data, to control the quality of the data being stored, to mask and protect the crucial information, to substantially reduce the cost of the data storage and to manage the rapidly growing amount of unstructured data is imperative.
Existing SAP clients need to define guidelines for data ownership and privacy before pooling and analyzing data generated within or outside the organization. Having security policies that promote sharing and self-service, while limiting access to restricted and sensitive data, facilitate the usage of data for decision making. An organization’s decision to move to cloud-based hosting and processing would further perpetuate security complexities. These include data-level authorizations and need to be addressed at the outset in order to ensure open vulnerabilities are addressed in time.
Usage and Presentation
It's one thing to analyze big data. It's quite another to present the analysis in a way that makes sense to non-statistical users. As statistical modeling approaches used for big data processing can be rather complex for business users, data visualization plays a critical part in communicating the findings and outcomes of the analysis. When deploying a big data environment, not only does the type and the size of the ubiquitous and indiscriminate data being analyzed matter, but also how quickly we need the data to be analyzed and how well the end result is presented to the business decision makers.
The unprecedented rate of data growth demands existing and future SAP implementations in order to consider the effective use of data for decision- making and differentiation from competition. The journey to introducing big-data processing poses many known and unknown risks on account of rapidly changing and evolving space. For most organizations, understanding this landscape is the difference between using data analysis to generate meaningful business insights -- or just taking a guess.