Skip to main content
Enterprise AI Analysis: Artificial intelligence approach to intrusion detection in industrial control systems with real world dataset generation and model evaluation

Artificial intelligence approach to intrusion detection in industrial control systems with real world dataset generation and model evaluation

Unlock Unprecedented AI Security for Industrial Control Systems

This study addresses escalating cybersecurity challenges within Industrial Control Systems (ICS), focusing on Programmable Logic Controllers (PLCs), through the application of artificial intelligence for intrusion detection. A physical ICS testbed was constructed using PLCs and industrial-grade SCALANCE switches to simulate real-world environments more faithfully than previous virtual setups. Seven types of cyberattacks—including Denial-of-Service (DoS), Man-in-the-Middle (MITM), ARP Spoofing, Data Injection, and Reconnaissance—were executed alongside legitimate traffic flows. PLC communication was managed using Node-RED, attacks were performed via Kali Linux, and traffic was captured using Wireshark and Python scripts to ensure full-spectrum monitoring of both benign and malicious activity. The result is a new labeled dataset—ICSCASD-MPLC—consisting of 2.6 million entries across 57 features, available in CSV format for direct integration with machine learning systems. Machine learning algorithms, Decision Tree (DT) and eXtreme Gradient Boosting (XGBoost), were trained and evaluated. DT achieved 97% binary and 98.6% multi-class accuracy, while XGBoost achieved 99% and 97.3%, respectively. Additionally, the system's ability to identify distinct traffic patterns associated with different types of attacks was validated, improving interpretability and detection granularity. This work contributes a high-fidelity, public dataset and a reproducible methodology for training and evaluating AI-based security solutions within ICS contexts. It aligns with urgent industry and academic needs by delivering a practical, data-driven solution for safeguarding critical infrastructure.

Quantifiable Impact of AI in ICS Cybersecurity

Advanced AI models, validated on realistic industrial data, deliver superior threat detection capabilities.

0% XGBoost Binary Accuracy
0% DT Multi-class Accuracy
0M+ Dataset Entries

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

The primary goal of this research is to strengthen cybersecurity in Industrial Control Systems (ICS) by creating a realistic, high-fidelity dataset (ICSCASD-MPLC) using Mitsubishi R04ENCPU PLCs and genuine network traffic. This dataset is then used to train and evaluate machine learning models, specifically Decision Trees (DT) and XGBoost, for detecting anomalies in ICS network traffic. The study aims to validate the effectiveness of these AI techniques in identifying cyber threats, addressing the urgent need for advanced security strategies to protect industrial infrastructure against sophisticated and persistent threats.

  • Realistic Dataset Creation: Developed ICSCASD-MPLC from a physical testbed simulating actual industrial conditions.
  • ML Model Application: Trained and evaluated DT and XGBoost for anomaly detection, achieving high accuracy (e.g., 99% for XGBoost binary).
  • Reproducible Methodology: Provided a clear procedure for testbed setup and attack simulations.

A physical ICS testbed was configured to simulate and monitor cyber-attacks targeting Mitsubishi R04ENCPU PLCs. The network includes Siemens SCALANCE XC208 and XB005 Ethernet switches, with PC1 and PC2 running Node-RED for data exchange, and PC3 (Kali Linux VM) launching attacks while Wireshark monitors traffic. Seven distinct attack scenarios were executed against PLC1, including DoS, MITM (ARP Spoofing), Data Injection, Port Scanning, and TCP SYN Scan, alongside normal traffic to generate a comprehensive, labeled dataset. This setup ensures that the captured traffic accurately reflects real-world communication dynamics under both benign and malicious conditions.

The simulated attacks fall into categories:

  • Availability Attacks: ICMP Flood, DoS Attack.
  • Confidentiality Attacks: Man-in-the-Middle (MITM) / ARP Spoofing, Replay Attack, Port Scanning, TCP SYN Scan.
  • Integrity Attacks: Injection Attack.

The ICSCASD-MPLC dataset, comprising 2.6 million records and 59 features, underwent rigorous preprocessing. This involved data cleaning (handling missing values in SourcePort, DestinationPort, Protocol), encoding categorical variables, Min-Max normalization, and addressing class imbalance through under-sampling to create balanced datasets for both binary and multi-class classification. Feature selection was performed using the MRMR (Minimum Redundancy Maximum Relevance) algorithm, reducing 53 initial features to 15 key features. Decision Trees (DT) and eXtreme Gradient Boosting (XGBoost) algorithms were then trained and evaluated on this refined dataset, demonstrating strong performance in intrusion detection tasks.

Key ML Algorithms:

  • Decision Trees (DT): Offers clear model interpretation by recursively splitting data based on decision criteria.
  • eXtreme Gradient Boosting (XGBoost): Advanced, high-performance gradient-boosted decision trees, known for robustness and accuracy on large datasets.

The machine learning models (DT and XGBoost) were evaluated using fivefold cross-validation repeated over 5 epochs. Key performance indicators included Accuracy (ACC), Precision (P), Recall (R), and F1-Score. Both classifiers showed excellent results:

Model & Classification TypeAvg Training AccuracyAvg Testing AccuracyF1 ScorePrecisionRecall
MRMR_DT (Binary Classification)0.970.970.970.970.97
MRMR_DT (Multi-class Classification)0.9860.9860.9860.9870.986
MRMR_XGBoost (Binary Classification)0.990.990.990.990.99
MRMR_XGBoost (Multi-class Classification)0.9730.9730.9720.9750.973

These results confirm the dataset's suitability for intrusion detection research and underscore the potential of machine learning for enhancing ICS security.

99% Peak Binary Classification Accuracy (XGBoost)

Enterprise Process Flow

Data Cleaning
Encoding
Normalization
Imbalance Handling
Feature Selection
Model Training
Evaluation
Detection Results

ICSCASD-MPLC vs. Existing Datasets

FeatureExisting Datasets (General)ICSCASD-MPLC (This Study)
Realism of EnvironmentOften virtual/simulated, limited hardware
  • Physical testbed with real Mitsubishi PLCs & industrial switches
Dataset AvailabilityMany not publicly available or limited
  • Publicly available (2.6M records, 57 features)
Attack CoverageVaried, sometimes limited types
  • Seven well-defined cyberattack scenarios (DoS, MITM, Injection, Scanning, Replay)
Machine Learning ReadinessMay require extensive preprocessing
  • Labeled, CSV format, designed for ML integration
Protocol & Vendor SpecificityDiverse, but often limited realism per protocol
  • Focus on Mitsubishi PLCs, MC protocol for high fidelity

Safeguarding Critical Infrastructure with ICSCASD-MPLC

The ICSCASD-MPLC dataset is instrumental in advancing cybersecurity for Industrial Control Systems. By providing a high-fidelity, real-world dataset, it enables researchers and practitioners to develop and test more robust intrusion detection systems. For instance, security teams can use this dataset to train AI models that precisely identify sophisticated attacks like Man-in-the-Middle or Data Injection within a Mitsubishi PLC environment. This leads to reduced false positives and improved real-time threat response, ensuring operational continuity and protecting vital industrial processes from disruption. The dataset's detailed traffic patterns for both normal and malicious activities allow for nuanced model training, making AI-powered IDS solutions more interpretable and effective in critical infrastructure settings.

Estimate Your Potential AI Security Savings

Quantify the impact of advanced AI-driven intrusion detection in your industrial operations. Select your industry, estimate your team size, and see potential annual savings in operational hours and costs.

Estimated Annual Savings $0
Hours Reclaimed 0

Your AI Security Implementation Roadmap

A structured approach to integrating AI-powered intrusion detection into your ICS environment, leveraging insights from the ICSCASD-MPLC dataset.

Phase 1: Initial Assessment & Testbed Setup

Review current ICS infrastructure, identify critical assets, and establish a secure, isolated testbed mirroring production environment for initial data collection and baseline establishment.

Phase 2: Data Collection & Custom Dataset Generation

Leverage tools like Wireshark and custom Python scripts to collect both normal and simulated attack traffic, creating a labeled, high-fidelity dataset akin to ICSCASD-MPLC.

Phase 3: AI Model Training & Optimization

Utilize the custom dataset for training and fine-tuning machine learning models (e.g., Decision Trees, XGBoost) to detect specific ICS attack patterns, incorporating feature selection for efficiency.

Phase 4: Validation & Integration

Rigorously validate the trained models against diverse attack scenarios. Integrate the AI-powered IDS into a non-production ICS environment for real-time monitoring and alert generation.

Phase 5: Deployment & Continuous Improvement

Deploy the AI-driven IDS in a monitored production environment. Establish continuous feedback loops for model retraining, adapting to new threats and evolving ICS landscapes.

Ready to Transform Your Industrial Security?

Connect with our experts to discuss how AI can safeguard your critical infrastructure.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking