Artificial intelligence approach to intrusion detection in industrial control systems with real world dataset generation and model evaluation

Unlock Unprecedented AI Security for Industrial Control Systems

This study addresses escalating cybersecurity challenges within Industrial Control Systems (ICS), focusing on Programmable Logic Controllers (PLCs), through the application of artificial intelligence for intrusion detection. A physical ICS testbed was constructed using PLCs and industrial-grade SCALANCE switches to simulate real-world environments more faithfully than previous virtual setups. Seven types of cyberattacks—including Denial-of-Service (DoS), Man-in-the-Middle (MITM), ARP Spoofing, Data Injection, and Reconnaissance—were executed alongside legitimate traffic flows. PLC communication was managed using Node-RED, attacks were performed via Kali Linux, and traffic was captured using Wireshark and Python scripts to ensure full-spectrum monitoring of both benign and malicious activity. The result is a new labeled dataset—ICSCASD-MPLC—consisting of 2.6 million entries across 57 features, available in CSV format for direct integration with machine learning systems. Machine learning algorithms, Decision Tree (DT) and eXtreme Gradient Boosting (XGBoost), were trained and evaluated. DT achieved 97% binary and 98.6% multi-class accuracy, while XGBoost achieved 99% and 97.3%, respectively. Additionally, the system's ability to identify distinct traffic patterns associated with different types of attacks was validated, improving interpretability and detection granularity. This work contributes a high-fidelity, public dataset and a reproducible methodology for training and evaluating AI-based security solutions within ICS contexts. It aligns with urgent industry and academic needs by delivering a practical, data-driven solution for safeguarding critical infrastructure.

Schedule Your AI Strategy Session

Quantifiable Impact of AI in ICS Cybersecurity

Advanced AI models, validated on realistic industrial data, deliver superior threat detection capabilities.

0% XGBoost Binary Accuracy

0% DT Multi-class Accuracy

0M+ Dataset Entries

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The primary goal of this research is to strengthen cybersecurity in Industrial Control Systems (ICS) by creating a realistic, high-fidelity dataset (ICSCASD-MPLC) using Mitsubishi R04ENCPU PLCs and genuine network traffic. This dataset is then used to train and evaluate machine learning models, specifically Decision Trees (DT) and XGBoost, for detecting anomalies in ICS network traffic. The study aims to validate the effectiveness of these AI techniques in identifying cyber threats, addressing the urgent need for advanced security strategies to protect industrial infrastructure against sophisticated and persistent threats.

Realistic Dataset Creation: Developed ICSCASD-MPLC from a physical testbed simulating actual industrial conditions.
ML Model Application: Trained and evaluated DT and XGBoost for anomaly detection, achieving high accuracy (e.g., 99% for XGBoost binary).
Reproducible Methodology: Provided a clear procedure for testbed setup and attack simulations.

A physical ICS testbed was configured to simulate and monitor cyber-attacks targeting Mitsubishi R04ENCPU PLCs. The network includes Siemens SCALANCE XC208 and XB005 Ethernet switches, with PC1 and PC2 running Node-RED for data exchange, and PC3 (Kali Linux VM) launching attacks while Wireshark monitors traffic. Seven distinct attack scenarios were executed against PLC1, including DoS, MITM (ARP Spoofing), Data Injection, Port Scanning, and TCP SYN Scan, alongside normal traffic to generate a comprehensive, labeled dataset. This setup ensures that the captured traffic accurately reflects real-world communication dynamics under both benign and malicious conditions.

The simulated attacks fall into categories:

Availability Attacks: ICMP Flood, DoS Attack.
Confidentiality Attacks: Man-in-the-Middle (MITM) / ARP Spoofing, Replay Attack, Port Scanning, TCP SYN Scan.
Integrity Attacks: Injection Attack.

The ICSCASD-MPLC dataset, comprising 2.6 million records and 59 features, underwent rigorous preprocessing. This involved data cleaning (handling missing values in SourcePort, DestinationPort, Protocol), encoding categorical variables, Min-Max normalization, and addressing class imbalance through under-sampling to create balanced datasets for both binary and multi-class classification. Feature selection was performed using the MRMR (Minimum Redundancy Maximum Relevance) algorithm, reducing 53 initial features to 15 key features. Decision Trees (DT) and eXtreme Gradient Boosting (XGBoost) algorithms were then trained and evaluated on this refined dataset, demonstrating strong performance in intrusion detection tasks.

Key ML Algorithms:

Decision Trees (DT): Offers clear model interpretation by recursively splitting data based on decision criteria.
eXtreme Gradient Boosting (XGBoost): Advanced, high-performance gradient-boosted decision trees, known for robustness and accuracy on large datasets.

The machine learning models (DT and XGBoost) were evaluated using fivefold cross-validation repeated over 5 epochs. Key performance indicators included Accuracy (ACC), Precision (P), Recall (R), and F1-Score. Both classifiers showed excellent results:

Model & Classification Type	Avg Training Accuracy	Avg Testing Accuracy	F1 Score	Precision	Recall
MRMR_DT (Binary Classification)	0.97	0.97	0.97	0.97	0.97
MRMR_DT (Multi-class Classification)	0.986	0.986	0.986	0.987	0.986
MRMR_XGBoost (Binary Classification)	0.99	0.99	0.99	0.99	0.99
MRMR_XGBoost (Multi-class Classification)	0.973	0.973	0.972	0.975	0.973

These results confirm the dataset's suitability for intrusion detection research and underscore the potential of machine learning for enhancing ICS security.

99% Peak Binary Classification Accuracy (XGBoost)

Enterprise Process Flow

Data Cleaning

→

Encoding

→

Normalization

→

Imbalance Handling

→

Feature Selection

→

Model Training

→

Evaluation

→

Detection Results

ICSCASD-MPLC vs. Existing Datasets

Feature	Existing Datasets (General)	ICSCASD-MPLC (This Study)
Realism of Environment	Often virtual/simulated, limited hardware	Physical testbed with real Mitsubishi PLCs & industrial switches
Dataset Availability	Many not publicly available or limited	Publicly available (2.6M records, 57 features)
Attack Coverage	Varied, sometimes limited types	Seven well-defined cyberattack scenarios (DoS, MITM, Injection, Scanning, Replay)
Machine Learning Readiness	May require extensive preprocessing	Labeled, CSV format, designed for ML integration
Protocol & Vendor Specificity	Diverse, but often limited realism per protocol	Focus on Mitsubishi PLCs, MC protocol for high fidelity

Safeguarding Critical Infrastructure with ICSCASD-MPLC

The ICSCASD-MPLC dataset is instrumental in advancing cybersecurity for Industrial Control Systems. By providing a high-fidelity, real-world dataset, it enables researchers and practitioners to develop and test more robust intrusion detection systems. For instance, security teams can use this dataset to train AI models that precisely identify sophisticated attacks like Man-in-the-Middle or Data Injection within a Mitsubishi PLC environment. This leads to reduced false positives and improved real-time threat response, ensuring operational continuity and protecting vital industrial processes from disruption. The dataset's detailed traffic patterns for both normal and malicious activities allow for nuanced model training, making AI-powered IDS solutions more interpretable and effective in critical infrastructure settings.

Estimate Your Potential AI Security Savings

Quantify the impact of advanced AI-driven intrusion detection in your industrial operations. Select your industry, estimate your team size, and see potential annual savings in operational hours and costs.

Your Industry Sector

Number of Employees Involved in Cybersecurity Operations

Average Weekly Hours Spent on Manual Threat Monitoring & Response

Average Hourly Cost Per Employee (including overhead)

Estimated Annual Savings $0

Hours Reclaimed 0

Schedule Your AI Security Consultation

Your AI Security Implementation Roadmap

A structured approach to integrating AI-powered intrusion detection into your ICS environment, leveraging insights from the ICSCASD-MPLC dataset.

Phase 1: Initial Assessment & Testbed Setup

Review current ICS infrastructure, identify critical assets, and establish a secure, isolated testbed mirroring production environment for initial data collection and baseline establishment.

Phase 2: Data Collection & Custom Dataset Generation

Leverage tools like Wireshark and custom Python scripts to collect both normal and simulated attack traffic, creating a labeled, high-fidelity dataset akin to ICSCASD-MPLC.

Phase 3: AI Model Training & Optimization

Utilize the custom dataset for training and fine-tuning machine learning models (e.g., Decision Trees, XGBoost) to detect specific ICS attack patterns, incorporating feature selection for efficiency.

Phase 4: Validation & Integration

Rigorously validate the trained models against diverse attack scenarios. Integrate the AI-powered IDS into a non-production ICS environment for real-time monitoring and alert generation.

Phase 5: Deployment & Continuous Improvement

Deploy the AI-driven IDS in a monitored production environment. Establish continuous feedback loops for model retraining, adapting to new threats and evolving ICS landscapes.

Ready to Transform Your Industrial Security?

Connect with our experts to discuss how AI can safeguard your critical infrastructure.

Secure Your ICS: Book a Free Strategy Session

Artificial intelligence approach to intrusion detection in industrial control systems with real world dataset generation and model evaluation

Unlock Unprecedented AI Security for Industrial Control Systems

Quantifiable Impact of AI in ICS Cybersecurity

Deep Analysis & Enterprise Applications

Enterprise Process Flow

ICSCASD-MPLC vs. Existing Datasets

Safeguarding Critical Infrastructure with ICSCASD-MPLC

Estimate Your Potential AI Security Savings

Your AI Security Implementation Roadmap

Phase 1: Initial Assessment & Testbed Setup

Phase 2: Data Collection & Custom Dataset Generation

Phase 3: AI Model Training & Optimization

Phase 4: Validation & Integration

Phase 5: Deployment & Continuous Improvement

Ready to Transform Your Industrial Security?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai