What’s new in a-Gnostics 2.0. Industrial AI service focused on anomaly detection and equipment failure prediction
We are very happy to release a-Gnostics 2.0., service for rapid development of predictive analytics models, and would like to share the technical details of the new release.
a-Gnostics implements an Industrial AI service focused on anomaly detection and equipment failure prediction. The service is tailored to multivariable processes and timeseries data retrieved from industrial equipment to automatically and correctly indicate normal, failure, and prefailure status. The main aim is to use Machine Learning and Artificial Intelligence to predict failures before it happens.
There are several services that are offered to the customers in the a-Gnostics set of services, or A-SETS.
Detailed description of current services is a subject of a separate article. The short list is the following:
- Forecast electricity consumption by regions and counties, accuracy 96–99%.
- Forecast energy resources (electricity and natural gas) consumption by large factories, accuracy ~95%.
- Forecasts of solar (PV) stations generation, accuracy up to 90%.
- Failure prediction and anomaly detection service for industrial equipment. Predictive analytics for the boilers at thermal power plants.
The next service in a-Gnostics would be a solution for prediction electrical motors failures to help manufacturers reduce their cost in repair and maintenance with the help of industrial AI and Machine Learning.
The main challenges working with industrial data are the variety of data, its sources, as well as needs for a flexible infrastructure that can scale linearly to hundreds and thousands models. More complexity adds high security requirements and compliance with data governance and model governance. All that challenges are the basic technical specification for the design and implementation of a-Gnostics technologies.
The diagram above shows the top-level architecture. Let’s take a closer look at this diagram, going from left to right, looking at each part.
Rather typical parts are shown in gray. Tasks that seem to us the most interesting and difficult to implement are marked as orange. The blue part is the UI, which is out of the scope of data science and predictive analytics topic of the article.
a-Gnostics system virtually can be divided into four main components:
One of the key principles — the deployment and launch of services are infrastructure independent, since enterprise applications can be deployed both in the cloud and in the corporate infrastructure. Also, it implements the ability to connect to a-Gnostics’ models via SaaS, or such a modern approach as Model-as-a-Service.
The video shows an overview of the tasks that need to be solved in order to develop an automated data pipeline for analyzing industrial data using machine learning methods:
Modern industry is a rather heterogeneous and very complex system. To get a high-quality training data set, it is important to have a set of standard connectors, as well as data processing modules. Having a ready-made corporate storage needs to be prepared for the fact that it will not contain all the necessary information, and in order to collect the entire necessary dataset, needs to create a new data pipeline for processing new data.
There is a trend to grant and improve access to the data from industrial equipment: many companies have begun the process of integrating / migrating their local infrastructure with cloud providers, mainly Azure and AWS. Especially, the availability of services such as AWS IoT Greengrass, AWS IoT SiteWise, Azure IoT Edge gives a clear signal to the future that gradually the cost and time for integration, as well as data collection will decrease, and it will be possible to focus more on building models and services.
Data that proceed by the system can be divided into four following types:
- historical data — most of them are csv, excel, and txt files, in addition, there may be files of a specific format, depending on the domain area. For specific formats, it is necessary to develop separate processing modules;
- real-time data — usually, these are values from equipment sensors. In this situation, there is no single standard, in most cases, needs to deal with an intermediate bridge in the cloud environment (AWS S3, AWS Kinesis, Azure Blob, Azure FileShare, etc.) or a separate standing server (for example, FTP), where the data located. The presence of such a link is dictated by security measures to protect against external interference. This is a definite plus, since there is no need to spend time developing custom connectors to handle specific protocols (for example, Modbus), although there are exceptions to the rules. The idea that an external user can connect using a conditional MQTT and start reading indicators is more an exception than a reality;
- open data — any useful information from open sources that will improve the training set or complement the visualization, for example, weather facts and forecasts;
- proprietary data — information that is not available in open sources or by the customer. In some tasks, needs to clarify open weather forecasts (cloudiness), or supplement them with new ones (solar radiation).
Most of the data that a-Gnostics proceed with is time-series data.
High-quality and complete datasets are a prerequisite for building accurate models, especially in an area such as industry, as well as the ability to visualize well (which is sometimes just as important as the model itself). An automated and standardized data pipeline is required to scale to thousands of different models and tasks. This is where the DataOps practice comes in.
To simplify — these are automated pipelines for data processing, in the version of ETL (Extract Transform Load) or the more modern and sometimes more flexible ELT (Extract Load Transform) approach.
In a-Gnostics there are ready-made data pipelines for processing and building datasets for the energy industry, as well as datasets of general purposes, like weather, which are used in many other tasks. Raw industrial data needs to be verified. Specific modules have been developed for validating, cleaning and transforming this data, which can be integrated into data pipelines.
Note: we are not building items from scratch, we use existing frameworks, libraries, etc., in order to focus only on unsolved problems.
For example, if it is needed to build a complex chain consisting of several datasets, then it is efficient to use Apache Airflow. It is the same for other tasks. If there are already ready-made components they will be integrated into the system.
From a data governance point of view, the database stores meta information about data versions, by whom, when from what sources, and for what task a particular dataset was made.
The following entities needs to be stored:
- raw data;
- proceed data;
- datasets ready for machine learning;
- models results’ — forecasts, failure prediction, etc.
PostgreSQL is used as a data store for meta information, model results, as well as some processed datasets. In other cases, the data is represented by files of different formats that are stored in the file system or services such as AWS S3, Azure Blob, Azure FileShare, etc. There are exceptions when it is more optimal to put real-time data into specialized storages for working with time series databases.
To develop machine learning models for the industry 4.0 is a challenging task. Variety of tasks, data sources, etc. carries the risk of slipping into custom development for each model. Therefore, it is important to focus on groups of tasks that are as common as possible among themselves and with good potential for reuse in related areas. Even focusing on several areas, deep knowledge in different areas of machine learning will be required. For example, working with the tasks of forecasting the consumption (as well as generation) of various energy commodities, like electricity or natural gas, or predicting the behavior and breakdowns of equipment, it is used both supervised and unsupervised learning, as well as semi-supervised learning. In addition, it is very useful to have a wide outlook and knowledge in a variety of areas: sometimes, to add a new synthetic feature to the model, knowledge from the school course in mathematics, physics and chemistry will help.
To be able to scale models to hundreds of objects, as well as train thousands of models, it is very important to automate all stages of model building: data preparation, model training, storage and use in industrial operation. The ModelOps approach for the lifecycle management of machine learning models is used. At the same time, needs try to use ready-made frameworks as much as possible. But for the model registry, it was developed the proprietary service that interacts internally with MLflow.
a-Gnostics out of the box contains ready-made pipelines for training and deploying models. As methods can be used ready-made solutions (for example, LSTM, XGBoost, Random Forest), unique models or complex chains, by analogy with stacking. To automate the process of choosing the most precise model in regression problems, it was developed a special walking forward algorithm that accepts a dataset and a set of models as input, and as a result it returns the most precise model with detailed metrics for each model.
To make it easier to work with common problems, wrapper classes are created around the models that make it easier to work with regression tasks for timeseries data.
Additional points to the automation of training and deployment of models was the task of forecasting electricity consumption. Mathematically this is a regression problem for a timeseries data. In theory retraining the model for the timeseries should be done as often as possible so that the model remains relevant to the input data stream. New data (facts of electricity consumption) are received every day. After conducting a series of experiments, we determined that if we play a long game, it is more profitable to retrain the model every day (while still being able to change the length of the training dataset from time to time). As a result, having N objects for forecasting, as well as M different models, every day needs to train N*M models. Even with small values of M and N, it is not possible to perform this task manually.
The use of the model in the production is based on the Model-as-a-Service (MaaS) principle. For example, in order to get an electricity forecast, you need to contact the corresponding module (ElectirictyService) via the REST API. Modules can be used separately from the API if needed.
In production, a simple call to the model is not enough; a software interface is needed that allows the user to interact with the model. For example, having a ready-made forecasting model, needs to load manufacturing plans for tomorrow, which are necessary to form the results of the model’s operation — forecast.
For this a-Gnostics.API was developed, containing different sets of calls for different tasks, for example, api/v1/forecast gives access to forecast calls, and api/v1/equipment opens the way to industrial equipment data.
For interactive interaction with the API, as well as data visualization, a user interface was developed and named DataDome.
Also, it is worth mentioning issues related to security. Implementation in the enterprise environment is accompanied by a very scrupulous check by the Information Security Department. In order to be ready in advance, and also in order to minimize the number of iterations during deployment, it is advisable to scan the source code, as well as the libraries used, for vulnerabilities during development. In our practice, we were faced with a situation where we had to urgently update the NumPy version, because a security check from the customer identified the presence of vulnerabilities in the version that we had (it is worth noting that it is not very outdated).
If the services are launched in the cloud, it is better to avoid running on virtual machines and use the appropriate services (AWS ECS, AWS EKS, Azure AKS, Azure App Service, etc.).
The main development language is Python, the system consists of modules and microservices architecture. Each service is a Docker container.
As of May 2021, more than 100,000 trained models were used in production, and more than 1,000,000 forecasts were executed.
Don’t hesitate to contact us with your questions and comments.