Bowler, Alexander
(2023)
Ultrasonic measurements and machine learning methods to monitor industrial processes.
PhD thesis, University of Nottingham.
Abstract
The process manufacturing sector is increasingly using the collection and analysis of data to improve productivity, sustainability, and product quality. The endpoint of this transformation is processes that automatically adapt to demands in real-time. In-line and on-line sensors underpin this transition by automatically collecting the real-time data required to inform decision-making. Each sensing technique possesses its own advantages and disadvantages making them suitable for specific applications. Therefore, a wide range of sensing solutions must be developed to monitor the diverse and often highly variable operations in process manufacturing. Ultrasonic (US) sensors measure the interaction of mechanical waves with materials. They have benefits of being in-line, real-time, non-destructive, low in cost, small in size, able to monitor opaque materials, and can be applied non-invasively.
Machine Learning (ML) is the use of computer algorithms to learn patterns in data to perform a task such as making predictions or decisions. The correlations in the data that the ML models learn during training have not been explicitly programmed by human operators. Therefore, ML is used to automatically learn from and analyse data. There are four main types of ML: supervised, unsupervised, semi-supervised, and reinforcement learning. Supervised and unsupervised ML are both used in this thesis. Supervised ML maps inputs to outputs during training with the aim being to create a model that accurately predicts the outputs of data that was not previously used during training. In contrast, unsupervised learning only uses input data in which patterns are discovered. Supervised ML is being increasingly combined with sensor measurements as it offers several distinct advantages over conventional calibration methods, these include: reduced time for development, potential for more accurate fitting, methods to encourage generalisation across parameter ranges, direct correlations to important process information rather than material properties, and ability for continuous retraining as more data becomes available.
The aim of this thesis was to develop ML methods to facilitate the optimal deployment of US sensors for process monitoring applications in industrial environments. To achieve this, the thesis evaluates US sensing techniques and ML methods across three types of process manufacturing operations: material mixing, cleaning of pipe fouling, and alcoholic fermentation of beer. Two US sensing techniques were investigated: a non-invasive, reflection-mode technique, and a transmission-based method using an invasive US probe with reflector plate. The non-invasive, reflection-mode technique is more amenable to industrial implementation than the invasive probe given it can be externally retrofitted to existing vessels. Different feature extraction and feature selection methods, algorithms, and hyperparameter ranges were explored to determine the optimal ML pipeline for process monitoring using US sensors. This facilitates reduced development time of US sensor and ML combinations when deployed in industrial settings by recommending a pipeline that has been trialled over a range of process monitoring applications. Furthermore, methods to leverage previously collected datasets were developed to negate or reduce the burden of collecting labelled data (the outputs required during ML model training and often acquired by using reference measurements) for every new process monitoring application. These included unlabelled and labelled domain adaptation approaches.
Both US sensing techniques investigated were found to be similarly accurate for process monitoring. To monitor the development of homogeneity during the blending of honey and water the non-invasive, reflection-mode technique achieved up to 100 % accuracy to classify whether the materials were mixed or non-mixed and an R2 of 0.977 to predict the time remaining (or time since) complete mixing was achieved. To monitor the structural changes during the mixing of flour and water, the aformentioned sensing method achieved an accuracy of 92.5 % and an R2 of 0.968 for the same classification and regression tasks. Similarly, the sensing method achieved an accuracy of up to 98.2 % when classifying whether fouling had been removed from pipe sections and R2 values of up 0.947 were achieved when predicting the time remaining until mixing was complete. The non-invasive, reflection-mode method also achieved R2 values of 0.948, Mean Squared Error (MSE) values of 0.283, and Mean Absolute Error (MAE) values of 0.146 to predict alcohol by volume percentage of alcohol during beer fermentation. In comparison, the transmission-based sensing method achieved R2 values of 0.952, MSE values of 0.265, and MAE values of 0.136 for the same task. Furthermore, the transmission-based method achieved accuracies of up to 99.8 % and 99.9 % to classify whether ethanol production had started and whether ethanol production had finished during an industrial beer fermentation process.
The material properties that affect US wave propagation are strongly temperature dependent. However, ML models that omitted the process temperature were comparable in accuracy to those which included it as an input. For example, when monitoring laboratory scale fermentation processes, the highest performing models using the process temperature as a feature achieved R2 values of 0.952, MSE values of 0.265, and MAE values of 0.136 to predict the current alcohol concentration, compared with R2 values of 0.948, MSE values of 0.283, and MAE values of 0.146 when omitting the temperature. Similarly, when transferring models between mixing processes, accuracies of 92.2 % and R2 values of 0.947 were achieved when utilising the process temperature compared with 92.1% and 0.942 when omitting the temperature. When transferring models between cleaning processes, inclusion of the process temperature as a feature degraded model accuracy during classification tasks as omitting the temperature produced the highest accuracies for 6 out of 8 tasks. Mixed results were obtained for regression tasks where including the process temperature increased model accuracy for 3 out of 8 tasks. Overall, these results indicate that US sensing, for some applications, is able to achieve comparable accuracy when the process temperature is not available. The choice of whether to include the temperature as a feature should be made during the model validation stage to determine whether it improves prediction accuracy.
The optimal feature extraction, feature selection, and ML algorithm permutation was determined as follows: Features were extracted by Convolutional Neural Networks (CNNs) followed by Principal Component Analysis (PCA) and inputted into deep neural networks with Long Short-Term Memory (LSTM) layers. The CNN was pre-trained on an auxiliary task using previously collected US datasets to learn features of the waveforms. The auxiliary task was to classify the dataset from which each US waveform originated. PCA was applied to reduce the dimensionality of the input data and enable the use of additional features, such as the US time of flight or measures of variation between consecutively acquired waveforms. This CNN and PCA feature extraction method was shown to produce more informative features from the US waveform compared to a traditional, coarse feature extraction approach, achieving higher accuracy on 65 % of tasks evaluated. The coarse feature method used commonly extracted parameters from US waveforms such as the energy, standard deviation, and skewness. LSTM units were used to learn the trajectory of the process features and so enable the use of information from previous timesteps to inform model prediction. Using LSTM units was shown to outperform neural networks with feature gradients used as inputs to incorporate information from previous timesteps for all process monitoring applications. Multi-task learning also showed improvements in learning feature trajectories and model accuracy (improving regression accuracy for 8 out of 18 tasks), however, at the expense of a greater number of hyperparameters to optimise. The choice to use multi-task learning should be evaluated during the validation stage of model development.
Unlabelled and labelled domain adaptation were investigated to transfer ML knowledge between similar processes. Unlabelled domain adaptation was used to transfer trained ML models between similar mixing and similar cleaning processes to negate the need to collect labelled data for a new task. Transfer Component Analysis was compared to a Single Feature transfer method. Transferring a single feature was found to be optimal, achieving classification accuracies of up to 96.0% and 98.4% to predict whether the mixing or cleaning processes were complete and R2 of up to 0.947 and 0.999 to predict the time remaining for each process, respectively. The Single Feature method was most accurate as it was most representative of the changing material properties at the sensor measurement area. Training ML models across a greater process parameter range (a greater range of temperatures; 19.3 to 22.1°C compared with 19.8 to 21.2°C) or multiple datasets improved transfer learning to further datasets by enabling the models to adapt to a wider range of feature distributions. Labelled domain adaptation increased model accuracy on an industrial fermentation dataset by transferring ML knowledge from a laboratory fermentation dataset. Federated learning was investigated to maintain dataset privacy when applying transfer learning between datasets. The federated learning methodology performed better than the other methods tested, achieving higher accuracy for 14 out of 16 machine learning tasks compared with the base case model which was trained using data solely from the industrial fermentation. This was attributed to federated learning improving the gradient descent operation during network optimisation. During the federated learning training strategy, the local models were trained for a full epoch on each dataset before network weights were sent to the global model. In contrast, during the non-federated learning strategy, batches from each dataset were interspersed. Therefore, it is recommended that the order that the data is passed to the model during training should be evaluated during the validation stage.
Overall, there are two main contributions from this thesis: Development of the ML pipeline for process monitoring using US sensors, and the development of unlabelled and labelled domain adaptation methods for process monitoring using US sensors. The development of an ML pipeline facilitates reduced time for the deployment of US sensor and ML combinations in industrial settings by recommending a method that has been trialled over a range of process monitoring applications. The unlabelled and labelled domain adaptation methods were developed to leverage previously collected datasets. This negates or reduces the burden of collecting labelled data in industrial environments. Furthermore, the pipeline and domain adaptation methodologies are evaluated using a non-invasive, reflection-mode US sensing technique. This technique is industrially relevant as it can be externally retrofitted onto existing process equipment.
The novelty contained within this thesis can be summarised as follows:
• The use of CNNs and LSTM layers for process monitoring using US sensor data: CNNs were used to extract spatial-invariant features from US sensor data to overcome problems of features shifting in the time domain due to changes in temperature or sound velocity. LSTM units were used for their ability to analyse sequences and understand temporal dependencies, critical for monitoring processes that develop over time. Feature extraction using CNNs was shown to produce more informative features from the US waveform compared to a traditional, coarse feature extraction approach, achieving higher accuracy on 65 % of tasks evaluated. LSTM units were shown to outperform neural networks with feature gradients used as inputs to incorporate information from previous timesteps for all process monitoring applications.
• Evaluating the omission of the process temperature as a feature for process monitoring using US sensor data: This indicates whether the US sensor and ML combinations could be used in industrial applications where measurement of the process temperature is not available. Overall, it was found that ML models which omitted the process temperature were comparable in accuracy to those which included it as an input (for example, R2 values of 0.952, MSE values of 0.265, and MAE values of 0.136 when including temperature compared with R2 values of 0.948, MSE values of 0.283, and MAE values of 0.146 were obtained when omitting the temperature to predict the current alcohol concentration during laboratory scale fermentation processes).
• The use of labelled and unlabelled domain adaptation for US data for process monitoring: Unlabelled domain adaptation was used to transfer trained ML models between similar mixing and similar cleaning processes to negate the need to collect labelled data for a new task. Labelled domain adaptation increased model accuracy on an industrial fermentation dataset by transferring ML knowledge from a laboratory fermentation dataset.
• The use of labelled and unlabelled domain adaptation on features extracted from US waveforms: This allows the domain adaptation methods to be used for diverse US waveforms as, instead of aligning the US sensor data, the US waveform features are used which provide information about the process being monitored as they develop over time.
• The use of federated learning and multi-task learning with US data: Federated learning was investigated to maintain dataset privacy when applying transfer learning between datasets. Multi-task learning was investigated to aid LSTM unit learning of the process trajectory. The federated learning methodology performed better than the other methods tested, achieving higher accuracy for 14 out of 16 ML tasks compared with the base case model. Multi-task learning also showed improvements in learning feature trajectories and model accuracy (improving regression accuracy for 8 out of 18 tasks evaluated), however, at the expense of a greater number of hyperparameters to optimise.
• The use of data augmentation for US data for process monitoring applications:
Data augmentation was a component of the convolutional feature extraction method developed in this thesis. Data augmentation artificially increased the dataset size to train the convolutional feature extractor while ensuring that features specific to each waveform, rather than the position or magnitude of features, were learned. This improved the feature-learning auxiliary task the CNN was trained to perform which classified the dataset from which each previously collected US waveform originated.
Actions (Archive Staff Only)
|
Edit View |