Thank you for Subscribing to CIO Applications Weekly Brief
Accelerating Product Development with Big Data
Kyle Cline and Andrei Khurshudov, Caterpillar Inc.
With onboard computers, sensors and cameras, more than one million assets are transmitting data to Caterpillar, which can enable advanced IoT analytics at scale. The data is created on customer jobsites around the world and can include time-series data, machine health alerts, fuel usage, GPS and operator-specific usage. This “Big Data” comes with high volume, velocity and variety. Big Data Analytics infrastructure and specialized skills are required from both data scientists/analysts and product engineers to develop the next generation of products and services.
Caterpillar is not alone in this space. More industrial devices and machines are connected every day, more and more Big Data is transmitted, and more and more data analysis is required to improve operational, business and product development decisions.
How is this data being processed, analyzed, reported and used? Is there a better way?
Generally, two approaches are used:
Raw data for highly skilled users.
This approach offers nearly unlimited flexibility in data analysis and modeling but requires users with special skills or trained data scientists (with skills in R, Python etc.) or data analysts (e.g., SQL.) ready to turn raw data into information to be consumed.
Status dashboards for the broader community. This approach offers good descriptive customization but little depth in analytics and very little modeling support. It usually answers the most-frequently asked basic questions using statistics and trends (e.g., how many machines are online, the trend over time, histograms for temperatures, etc.)
The above approaches offer advanced capabilities (Approach 1) and simplicity (Approach 2) but fall short when offered to the broader community of thousands of engineers.
The first approach requires all engineers be trained in raw data access, manipulation and analysis. Many are not. Successful analysis also benefits from years of experience to handle modeling subtleties. Even if an engineer has such skills, up to 80 percent of their time working with raw data is typically spent on understanding, manipulating and cleaning the data while only 20 percent on analyzing and modeling. Conversely, the second approach may be too simplistic to be useful unless an engineer is looking for the frequently required answers for which a dashboard has been created.
Many engineers want something that will give them depth and flexibility but also simplify and accelerate the analysis and modeling that answers their questions. Fortunately, there is another way, which we call the “Library of Solutions”(Approach 3)shown below.
Big Data Analytics infrastructure and specialized skills are required from both data scientists/analysts and product engineers to develop the next generation of products and services
All approaches may share a common IoT backbone. The connected asset sends IoT data to the IoT Platform where it is stored and becomes available for data processing. Traditionally, this data will be fed into a dynamic dashboard (Approach 2) or made available in raw form to data scientists and analysts (Approach 1).
In Approach 3, the engineering community identifies ‘common denominator’ analytics requests, tasks and needs. Then, a set of the ‘common denominator’ analytics tools is developed, connected to clean data and made available to the community via an online application (e.g., Web App).With good engineering engagement, one can address 80 percent or more of the immediate engineering needs for analytics and modeling. These modularized solutions form a library of re-usable elements that can be incorporated into other complex models and analytics.
Access to raw data is also maintained to allow advanced users to perform their own analyses to address less-common needs. Approach 3 serves most of the engineering community and offers the following advantages:
• Independent of their data analytics skills, engineers gain easy access to analytics that address their needs.
• Reusable elements increase consistency, accuracy and overall quality of analytics answers. The same elements can assist in answering many questions. Library of solutions provides data connected, validated modules to accelerate custom analyses.
• ‘Common denominator’ analytics tools can be deployed in cloud-based environments to provide scalable compute power to fit the speed and budget needs of each engineer.
• Results data is analyzed, curated and ready for easy consumption.
• More data-driven decisions that align products and services with customer machine use in the field.
• Overall quality of the R&D process is improved –verified field performance vs. design requirements, faster and more thorough issue investigation, faster time to market, reduced warranty, and, ultimately, reduced product and investment cost.
In conclusion, an up-to-date library of modularized ‘common denominator’ analytics solutions for the engineering community democratizes data analytics, measurably accelerates the R&D process, reduces investment costs, increases efficiency and ultimately, helps engineers best align products and services to what customers need.
Kyle Cline is a Product Development expert with a keen focus on Telematics Big Data Analytics and the Industrial Internet of Things. Currently, Kyle serves as an IoT Analytics Manager at Cat Digital, the digital technology arm of Caterpillar, Inc. Kyle manages a team of data scientists developing machine analytics and predictive modeling that study how Caterpillar’s customers utilize over a million connected machines. Kyle has a degree in Electrical Engineering and spent the first 12 years of his Caterpillar career in Product Development, where he led the design, build, and testing of new motor grader products. He also spent 3 years on assignment in Piracicaba, Brazil, as the technical liaison to Caterpillar’s largest manufacturing facility in South America.
In Cat Digital, Kyle combines his Caterpillar engineering experience and his expertise in advanced analytics to deliver better products and services that enhance the value of Caterpillar’s cutting-edge equipment around the world.
Dr. Andrei Khurshudov specializes in Big Data Analytics, the Industrial Internet of Things, cloud storage and computing, in-memory computing, and data storage technology. Andrei is a Director of IoT Analytics at CAT Digital, the digital technology arm of Caterpillar. Andrei’s organization focuses on data analysis and predictive modeling for a million connected machines and devices. Andrei has spent over ten years at Seagate Technology, serving as a Chief Technologist and managing various R&D organizations in such areas as big data analytics, cloud storage, cloud computing, quality and reliability, and others. In the recent past, Andrei served as a Chief Data Officer at Formulus Black, a New Jersey startup developing software for in-memory computing, and a CTO and Chief Data Officer at Alchemy IoT, a Boulder-area startup creating cloud-based analytics solutions for the Internet of Things. Andrei has a Ph.D. in Engineering and worked at IBM, Hitachi Global Storage, and Samsung. Andrei has numerous publications, patents, conference presentations, and a book.