Data science and machine learning in an industrial context
Entire new economies in consumer markets have been built around data-driven modelling, so we expect that these techniques will have a growing impact on industries in the years ahead.
Great promise
Data-driven modelling will complement and in some cases replace physical and engineering-based models through access to vast amounts of data combined with increased processing power and new modelling techniques. This will enable us to derive models from patterns and signals in the data itself, as opposed to being limited to making assumptions about how assets perform in the real world. The main outcomes will be the ability to automate a whole range of processes, to detect anomalies at a much earlier stage, simulate the impact of operational scenarios and to predict future states and events. It will surely contribute significantly to making industry more efficient, much safer and reduce its environmental impact.
There are already emerging success stories in which data-driven modelling has contributed to significant efficiency gains, but for applications where the consequences of wrong decisions are large, many industrial data science or machine learning (ML) projects still fall short of expectations.
Jeff Immelt, former CEO of GE predicted in 2015 as much as 20% performance increase across all industries from smarter maintenance based on insights from data-driven models. Substantial investments have been made in pursuit of this goal by large industrial actors, OEMs and startups, but the success stories are still few and far apart.
The barriers remain
In our experience, there are still significant barriers to realizing the full potential of data-driven applications, but these can be overcome if all parties in industry, including asset owners; OEMs; stakeholders; third parties; startups; established consultancies; and academia contribute to this.
Typical barriers are:
- Data is often simply not fit for purpose. Data is generally generated as a by-product of control systems, monitoring systems and transactional processes and is not collected with the goal of enabling data-driven insights.
- There are too few events to train data-driven models to detect anomalies and predict future events. Data-driven models require a significant number of events to identify and interpret signals in the data leading up to the events. In industry, there are generally large safety margins and relatively few incidents and failures.
- Frameworks for trusting the outputs of models and algorithms, and for managing the new risks associated with their use, are not yet in place at scale. For example, in the case of smarter maintenance, a wrong decision to postpone maintenance can have fatal consequences.
- Data-driven modelling, represented in particular by machine learning (ML) and artificial intelligence (AI), are at the top of the Gartner hype cycle. Everybody wants to be part of it and it is easy to underestimate the skills and care that are required to successfully develop and deploy data-driven solutions.
Overcoming the barriers
DNV decided early to engage together with customers and learn about the opportunities and barriers. We have been part of more than 40 pilots and projects working on hard problems with real industrial data in the areas of data management and data-driven modelling.
From this experience we have developed a framework for assessing and managing organizational data management maturity and data quality to ensure accountability and control throughout the data value chain. We are also in the process of establishing a framework for assessing and managing algorithms and models and the processes with which they are developed and managed. We are also exploring ways of providing transparency into how algorithms and models work. These are some of the key elements that must be implemented to ultimately enable asset owners and stakeholders to trust the outcomes of models and algorithms.
Key steps to overcome the barriers are:
- Engage the whole organization, from where the data is generated to where the decisions are made. Understand the balance between potential value creation and the costs associated with establishing the data value chains and data-driven solutions
- Learn to treat data as an asset in itself, by investing in the people, processes and technologies to establish and manage the right data to the right quality
- Invest in automated data value chains across the silos they are generated or reside in (control systems, monitoring systems, software applications, transactional systems, external data sources etc.) while ensuring governance, standardization and quality
- The application of data science to solve hard industry problems is currently far from a drag- and-drop exercise, no matter which of the myriad of tools one chooses. Take care to invest enough time and the right resources to undertake the work needed and the right technology that enable cost-effective development and operation
- Take part in data-sharing initiatives. Where events for training data-driven models are few and far between, cross-industry data-sharing initiatives may help to establish a critical mass of data for the benefit of entire industry segments.
9/21/2018 12:58:05 PM