Skip to main content
Enterprise AI Analysis: A Multimodal Phishing Website Detection System Using Explainable Artificial Intelligence Technologies

Enterprise AI Analysis

A Multimodal Phishing Website Detection System Using Explainable Artificial Intelligence Technologies

The purpose of the present study is to improve the efficiency of phishing web resource detection through multimodal analysis and using methods of explainable artificial intelligence. We propose a late fusion architecture in which independent specialized models process four modalities and are combined using weighted voting. The first branch uses CatBoost for URL features and metadata; the second uses CNN1D for symbolic-level URL representation; the third uses a Transformer based on a pretrained CodeBERT for the homepage HTML code; and the fourth uses EfficientNet-B7 for page screenshot analysis. SHAP, Grad-CAM, and attention matrices are used to interpret decisions; a local LLM generates a consolidated textual explanation. A prototype system based on a microservice architecture, integrated with the SOC, has been developed. This integration enables streaming processing and reproducible validation. Computational experiments using our own updated dataset and the public MTLP dataset show high performance: F1-scores of up to 0.989 on our own dataset and 0.953 on MTLP; multimodal fusion consistently outperforms single-modal baseline models. The practical significance of this approach for zero-day detection and false positive reduction, through feature alignment across modalities and explainability, is demonstrated. All limitations and operational aspects (data drift, adversarial robustness, LLM latency) of the proposed prototype are presented. We also outline areas for further research.

Executive Impact: Quantifiable Results

This research demonstrates significant advancements in phishing detection, offering superior accuracy, speed, and interpretability for enterprise security operations.

0.0 F1-Score (Proprietary Dataset)
0.0 F1-Score (MTLP Public Dataset)
0 Peak Processing Throughput
0 Zero-Day Phishing Detection

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

System Architecture
Modality Fusion
XAI Pipeline
Performance & Robustness

Enterprise Process Flow

Data Preprocessing & Context Enrichment
Data Management for ML Training
ML Model Creation & Management
ML Model Lifecycle Management
Classification & Decision Making
Explanation Generation
SOC/TI/SIEM Integration
Decision Validation & Operator Interaction

Main ML Models Used

Multimodal Component Model Designation Model Type
URL + Metadata M1 CatBoost (Optuna-tuned)
URL (Character-Level) M2 CNN1D
HTML Code M3 CodeBERT Transformer (Fine-tuned)
Image (Screenshot) M4 EfficientNet-B7 (Fine-tuned)
Late Binding Fusion M5 Trainable Meta-Classifier (Weighted Voting)

Modality Fusion Strategies

Early Fusion (Concatenation, Unified Model)
Intermediate Fusion (Encoders, Fusion Layer)
Late Fusion (Independent Models, Aggregation)

Comparative Analysis of Fusion Strategies

Fusion Strategy Advantages Disadvantages
Early Fusion
  • Captures low-level correlations
  • Simple architecture
  • Loss of modality specificity
  • Complexity of training
Intermediate Fusion
  • Balances specificity & interaction
  • Mechanisms of attention
  • Average learning difficulty
Late Fusion
  • Model specialization
  • Modularity & Parallelization
  • Best for heterogeneous data
  • Loss of intermodule connections
Hybrid Fusion
  • Maximum performance & adaptability
  • High model complexity
  • Risk of overfitting

End-to-End XAI Explanation Flow

Metadata, URL, HTML, Image Inputs
ML Models (CatBoost, CNN1D, CodeBERT, EfficientNet-B7)
XAI Methods (SHAP, Grad-CAM, Attention Matrix)
Local LLM Explainer (QwQ-32B)
Natural Language Report

Overview of XAI Methods Integrated

Multimodal Component Model Designation Model Formed Features of XAI
URL + Metadata M1 CatBoost SHAP Values
URL (Character-Level) M2 CNN1D Grad-CAM, Integrated Gradients
HTML Code M3 CodeBERT Attention Matrix
Image (Screenshot) M4 EfficientNet-B7 Grad-CAM

Enhanced Explainability Utility for SOC Analysts (with RAG)

0.0 Average Expert Utility Score 0.0 Inter-Rater Consistency (Cronbach's α) 0.0 Expert Agreement (Fleiss's κ)

Multimodal Performance Summary (F1-Scores)

Experiment Models Dataset F1-Score
I CatBoost (M1) + CodeBERT (M3) + Voting Classifier D1 (own dataset) 0.972
II CNN1D (M2) + EfficientNet-B7 (M4) + Voting Classifier D2 (MTLP) 0.944
III All Models (M1+M2+M3+M4) + Voting Classifier D1 (own dataset) 0.989
III All Models (M1+M2+M3+M4) + Voting Classifier D2 (MTLP) 0.953

Robustness to Obfuscation Techniques (Decrease in Performance)

Enhanced Attack Subclass Individual Models Multimodal Model
URL obfuscation 2% 1.5%
HTML obfuscation 4% 2%
Visual camouflage 5% 3%

Real-World Efficacy: Zero-Day Detection and Human-in-the-Loop Validation

The system demonstrated strong capabilities in detecting previously unknown, zero-day phishing links. In a test involving 100 such links, the system identified 92 as suspicious, including those not flagged by open tools like VirusTotal. This highlights the system's ability to recognize novel attack patterns through its comprehensive feature analysis across modalities.

Furthermore, the integrated Explainable AI (XAI) subsystem significantly improved SOC analyst decision-making. Expert reviews showed that explanations provided with RAG (Retrieval-Augmented Generation) achieved an average utility score of 4.20/5, with high inter-rater consistency (Fleiss's κ of 0.84 and Cronbach's α of 0.96). This human-in-the-loop validation underscores the practical value of explainability in critical cybersecurity operations, enabling analysts to quickly understand the rationale behind detections and refine the system's training data.

Calculate Your Potential AI ROI

Estimate the efficiency gains and cost savings your enterprise could achieve with advanced AI solutions.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A strategic phased approach ensures successful integration and maximum impact for your enterprise.

Phase 01: Discovery & Strategy

Comprehensive assessment of your current infrastructure, data, and business objectives. Define clear AI integration strategies and success metrics.

Phase 02: Prototype Development

Rapid prototyping of core AI modules, utilizing selected models and data modalities relevant to your specific use cases. Initial validation of performance.

Phase 03: System Integration

Seamless integration of the AI system within your existing SOC/TI/SIEM frameworks. Development of API connections and microservice deployment.

Phase 04: Validation & Optimization

Thorough testing against real-world and zero-day threats. Fine-tuning of models, meta-classifier, and XAI components for optimal accuracy and interpretability.

Phase 05: Continuous Improvement

Ongoing monitoring for data drift, regular retraining, and updates to ensure sustained high performance and adaptability to evolving threat landscapes.

Ready to Transform Your Cybersecurity with AI?

Book a personalized consultation with our AI specialists to discuss how these advanced multimodal and explainable AI technologies can be tailored to your enterprise needs.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking