Artificial intelligence approach to intrusion detection in industrial control systems with real world dataset generation and model evaluation
Unlock Unprecedented AI Security for Industrial Control Systems
This study addresses escalating cybersecurity challenges within Industrial Control Systems (ICS), focusing on Programmable Logic Controllers (PLCs), through the application of artificial intelligence for intrusion detection. A physical ICS testbed was constructed using PLCs and industrial-grade SCALANCE switches to simulate real-world environments more faithfully than previous virtual setups. Seven types of cyberattacks—including Denial-of-Service (DoS), Man-in-the-Middle (MITM), ARP Spoofing, Data Injection, and Reconnaissance—were executed alongside legitimate traffic flows. PLC communication was managed using Node-RED, attacks were performed via Kali Linux, and traffic was captured using Wireshark and Python scripts to ensure full-spectrum monitoring of both benign and malicious activity. The result is a new labeled dataset—ICSCASD-MPLC—consisting of 2.6 million entries across 57 features, available in CSV format for direct integration with machine learning systems. Machine learning algorithms, Decision Tree (DT) and eXtreme Gradient Boosting (XGBoost), were trained and evaluated. DT achieved 97% binary and 98.6% multi-class accuracy, while XGBoost achieved 99% and 97.3%, respectively. Additionally, the system's ability to identify distinct traffic patterns associated with different types of attacks was validated, improving interpretability and detection granularity. This work contributes a high-fidelity, public dataset and a reproducible methodology for training and evaluating AI-based security solutions within ICS contexts. It aligns with urgent industry and academic needs by delivering a practical, data-driven solution for safeguarding critical infrastructure.
Quantifiable Impact of AI in ICS Cybersecurity
Advanced AI models, validated on realistic industrial data, deliver superior threat detection capabilities.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
The primary goal of this research is to strengthen cybersecurity in Industrial Control Systems (ICS) by creating a realistic, high-fidelity dataset (ICSCASD-MPLC) using Mitsubishi R04ENCPU PLCs and genuine network traffic. This dataset is then used to train and evaluate machine learning models, specifically Decision Trees (DT) and XGBoost, for detecting anomalies in ICS network traffic. The study aims to validate the effectiveness of these AI techniques in identifying cyber threats, addressing the urgent need for advanced security strategies to protect industrial infrastructure against sophisticated and persistent threats.
- Realistic Dataset Creation: Developed ICSCASD-MPLC from a physical testbed simulating actual industrial conditions.
- ML Model Application: Trained and evaluated DT and XGBoost for anomaly detection, achieving high accuracy (e.g., 99% for XGBoost binary).
- Reproducible Methodology: Provided a clear procedure for testbed setup and attack simulations.
A physical ICS testbed was configured to simulate and monitor cyber-attacks targeting Mitsubishi R04ENCPU PLCs. The network includes Siemens SCALANCE XC208 and XB005 Ethernet switches, with PC1 and PC2 running Node-RED for data exchange, and PC3 (Kali Linux VM) launching attacks while Wireshark monitors traffic. Seven distinct attack scenarios were executed against PLC1, including DoS, MITM (ARP Spoofing), Data Injection, Port Scanning, and TCP SYN Scan, alongside normal traffic to generate a comprehensive, labeled dataset. This setup ensures that the captured traffic accurately reflects real-world communication dynamics under both benign and malicious conditions.
The simulated attacks fall into categories:
- Availability Attacks: ICMP Flood, DoS Attack.
- Confidentiality Attacks: Man-in-the-Middle (MITM) / ARP Spoofing, Replay Attack, Port Scanning, TCP SYN Scan.
- Integrity Attacks: Injection Attack.
The ICSCASD-MPLC dataset, comprising 2.6 million records and 59 features, underwent rigorous preprocessing. This involved data cleaning (handling missing values in SourcePort, DestinationPort, Protocol), encoding categorical variables, Min-Max normalization, and addressing class imbalance through under-sampling to create balanced datasets for both binary and multi-class classification. Feature selection was performed using the MRMR (Minimum Redundancy Maximum Relevance) algorithm, reducing 53 initial features to 15 key features. Decision Trees (DT) and eXtreme Gradient Boosting (XGBoost) algorithms were then trained and evaluated on this refined dataset, demonstrating strong performance in intrusion detection tasks.
Key ML Algorithms:
- Decision Trees (DT): Offers clear model interpretation by recursively splitting data based on decision criteria.
- eXtreme Gradient Boosting (XGBoost): Advanced, high-performance gradient-boosted decision trees, known for robustness and accuracy on large datasets.
The machine learning models (DT and XGBoost) were evaluated using fivefold cross-validation repeated over 5 epochs. Key performance indicators included Accuracy (ACC), Precision (P), Recall (R), and F1-Score. Both classifiers showed excellent results:
| Model & Classification Type | Avg Training Accuracy | Avg Testing Accuracy | F1 Score | Precision | Recall |
|---|---|---|---|---|---|
| MRMR_DT (Binary Classification) | 0.97 | 0.97 | 0.97 | 0.97 | 0.97 |
| MRMR_DT (Multi-class Classification) | 0.986 | 0.986 | 0.986 | 0.987 | 0.986 |
| MRMR_XGBoost (Binary Classification) | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
| MRMR_XGBoost (Multi-class Classification) | 0.973 | 0.973 | 0.972 | 0.975 | 0.973 |
These results confirm the dataset's suitability for intrusion detection research and underscore the potential of machine learning for enhancing ICS security.
Enterprise Process Flow
| Feature | Existing Datasets (General) | ICSCASD-MPLC (This Study) |
|---|---|---|
| Realism of Environment | Often virtual/simulated, limited hardware |
|
| Dataset Availability | Many not publicly available or limited |
|
| Attack Coverage | Varied, sometimes limited types |
|
| Machine Learning Readiness | May require extensive preprocessing |
|
| Protocol & Vendor Specificity | Diverse, but often limited realism per protocol |
|
Safeguarding Critical Infrastructure with ICSCASD-MPLC
The ICSCASD-MPLC dataset is instrumental in advancing cybersecurity for Industrial Control Systems. By providing a high-fidelity, real-world dataset, it enables researchers and practitioners to develop and test more robust intrusion detection systems. For instance, security teams can use this dataset to train AI models that precisely identify sophisticated attacks like Man-in-the-Middle or Data Injection within a Mitsubishi PLC environment. This leads to reduced false positives and improved real-time threat response, ensuring operational continuity and protecting vital industrial processes from disruption. The dataset's detailed traffic patterns for both normal and malicious activities allow for nuanced model training, making AI-powered IDS solutions more interpretable and effective in critical infrastructure settings.
Estimate Your Potential AI Security Savings
Quantify the impact of advanced AI-driven intrusion detection in your industrial operations. Select your industry, estimate your team size, and see potential annual savings in operational hours and costs.
Your AI Security Implementation Roadmap
A structured approach to integrating AI-powered intrusion detection into your ICS environment, leveraging insights from the ICSCASD-MPLC dataset.
Phase 1: Initial Assessment & Testbed Setup
Review current ICS infrastructure, identify critical assets, and establish a secure, isolated testbed mirroring production environment for initial data collection and baseline establishment.
Phase 2: Data Collection & Custom Dataset Generation
Leverage tools like Wireshark and custom Python scripts to collect both normal and simulated attack traffic, creating a labeled, high-fidelity dataset akin to ICSCASD-MPLC.
Phase 3: AI Model Training & Optimization
Utilize the custom dataset for training and fine-tuning machine learning models (e.g., Decision Trees, XGBoost) to detect specific ICS attack patterns, incorporating feature selection for efficiency.
Phase 4: Validation & Integration
Rigorously validate the trained models against diverse attack scenarios. Integrate the AI-powered IDS into a non-production ICS environment for real-time monitoring and alert generation.
Phase 5: Deployment & Continuous Improvement
Deploy the AI-driven IDS in a monitored production environment. Establish continuous feedback loops for model retraining, adapting to new threats and evolving ICS landscapes.
Ready to Transform Your Industrial Security?
Connect with our experts to discuss how AI can safeguard your critical infrastructure.