Today, machine learning (ML) is applied to threat detection and classification problems in the important field of industrial control. This paper reviews different ML papers that apply supervised learning, unsupervised learning, deep learning, and ensemble learning in the field. Researchers and practitioners in cybersecurity and infrastructure security will find this paper useful, while ML researchers can use the information as a starting point for their work in the field.
Table 2 (of the paper) gives a quick summary of the main findings about the algorithms, including their effectiveness for different attacks like false data injection (FDI), denial-of-service (DoS), reconnaissance (Recon), and spoofing (Spo). Table 3 (of the paper) includes the usefulness of publicly available datasets.
The paper rightfully points out the limitations of supervised learning due to the cost of labeling the data. Unsupervised learning can use clustering and other techniques that do not require labeled data. However, deep learning (neural network based) does not always need labeled data as the paper claims. Deep-learning-based feature selection and self-supervised learning methods (for example, bootstrap your own latent (BYOL) [1] and distillation with no labels (DINO) [2]) do not need labeled data.
This paper is a good starting point for ML algorithms and datasets for the security of industrial control systems. However, performance data for the algorithms would be a good addition. How long does it take to train a system for different algorithms and datasets? How long would it take during the inference? What are the sizes of the models? Can they be deployed easily? It would be nice to get answers to these questions. More information about self-supervised learning techniques being used in industrial control, as well as information about the creation and usage of synthetic data, would make this useful paper even more meaningful.