Andrei Khurshudov, Ph.D., Director, IoT Analytics, CAT Digital
Caterpillar is synonymous with heavy machinery and, for more than 90 years, our products help our customers build a better world. We've also been driving industry-leading connectivity solutions for over 20 years, introducing our first telematics device in 1999. Nowadays, Caterpillar’s growing digital team is supporting the company’s digital strategy through advanced data science and IoT solutions.
With onboard computers, sensors, and cameras, approximately one million assets are transmitting data to Caterpillar to enable advanced IoT analytics at scale. This data can include time-series data, machine health alerts, fuel usage, GPS, and operator-specific usage.
Powered by data, Caterpillar’s IoT analytics can provide customers value at a lower cost of ownership, increased productivity, safety, and reduced maintenance costs. We use analytics to tell when a machine or part of a machine needs to be serviced or replaced, how to operate more effectively to increase production, how to reduce operational costs, how to increase service life, and more.
So, what is the future of this fast-growing field - IoT analytics?
To answer this question, let us review and discuss the main trends and challenges in this field.
Trend: The increasing amount of data.
Challenge: Quality of available data.
The Industrial Internet of Things (IIoT) is a data-generating engine for IoT analytics. For example, a large modern truck can have over 100 IoT sensors, each producing telematics data at 1 Hz frequency or faster. An analyst can face quality problems in working with such a large set of data. Missing batches or messages, missing channels, malfunctioning sensors, buggy extract, transform, load (ETL), and other factors degrade the efficacy of IoT analytics. Channel naming irregularities could be another issue which can result in a data scientist investing more of their time to quality control rather than analytics.
Action: Invest in data quality monitoring and improvements. This will pay off long term. As the amount of data keeps increasing, tackling this issue later will become incrementally more difficult.
Trend: Preference for "supervised" analytical models.
Challenge: The lack of quality "ground truth."
Most "supervised" machine learning models rely on "ground truth" to separate data into two or more classes.
Invest in data quality monitoring and improvements. This will pay off long term
For instance, such classes can be "healthy" or "unhealthy", which is the type of prediction one would want to make for an IoT device. Supervised models are more directly usable, typically come with an estimate of their accuracy, and are thus preferred for many applications. However, until issues with data quality - including the ground truth data quality - are resolved, achieving high accuracy is hardly possible for supervised models.
Action: Focus on "unsupervised" modeling today, but build an infrastructure that is compatible with "supervised" models, and continue improving the quality of data. Transition to supervised models can and will happen gradually as infrastructure is put in place to automatically label data.
Trend: Automated analytics.
Challenge: Lack of algorithm transparency limits scalability due to human workload.
The ultimate goal for IoT analytics is to fully automate tasks that can be automated, freeing human experts to focus on the most complex problems. This is the most efficient and cost-effective way. However, most of the machine learning (ML) algorithms are the so-called "black box algorithms" and the decisions they make are often hard to interpret and trust.
Action: Invest in people training, education, and building trust in models used. Also, invest in solutions that can interpret models’ answers for human users. The more transparent models' responses are, the faster analytics automation becomes a reality.
Trend: Migration to the cloud.
Challenge: Data connectivity issues, analytics decision latency, and infrastructure cost.
Action: Migrating data, IoT analytics, and other services to one cloud makes perfect sense in many cases. After all, the cloud offers scalability, elasticity, and a pay-as-you-go approach, allowing decreased capital expenditure. On the other hand, relying on the cloud for near-real-time IoT analytics might be impossible in some cases due to the data latency issues.
Action: Plan for a flexible, integrated, and reliable end-to-end analytics solution, from the onboard/edge analytics to powerful cloud-based analytics. When such a solution is available, use it all or its parts to address connectivity, latency, and cost.
Trend: A shift from data batching to streaming and real-time analytics.
Challenge: Complexity of changing the existing infrastructure.
The general trend in analytics is to shift from batch data processing to real-time processing. The cost of such a shift could be high, especially when installed infrastructure is extensive.
Action: Plan for the future - this transition will take place regardless of immediate needs. Customer requirements, technological advances, and competitive pressure will eventually support more streaming applications.
Trend: Ubiquitous access to IoT data.
Challenge: Data IP, ownership, security, and governance.
Having all IoT data in one place, and giving all teams access to all the data, could bean excellent way to accelerate product and technology development, reduce cost, innovate, and improve collaboration across the company. However, questions arise – who owns which data, who can access which data, do we have the rights to use the data as proposed, or how should we handle highly-confidential and harmless data in the same cloud. There are evolving regulations, industry standards, best practices, and user expectations, that need to be considered together to support use of IoT data.
Action: Invest in robust data governance and security process. Leverage services available from the major cloud service providers where possible, recognizing that they may only offer a floor from which an organization can build on.