Enterprise AI Analysis
A Multimodal Phishing Website Detection System Using Explainable Artificial Intelligence Technologies
The purpose of the present study is to improve the efficiency of phishing web resource detection through multimodal analysis and using methods of explainable artificial intelligence. We propose a late fusion architecture in which independent specialized models process four modalities and are combined using weighted voting. The first branch uses CatBoost for URL features and metadata; the second uses CNN1D for symbolic-level URL representation; the third uses a Transformer based on a pretrained CodeBERT for the homepage HTML code; and the fourth uses EfficientNet-B7 for page screenshot analysis. SHAP, Grad-CAM, and attention matrices are used to interpret decisions; a local LLM generates a consolidated textual explanation. A prototype system based on a microservice architecture, integrated with the SOC, has been developed. This integration enables streaming processing and reproducible validation. Computational experiments using our own updated dataset and the public MTLP dataset show high performance: F1-scores of up to 0.989 on our own dataset and 0.953 on MTLP; multimodal fusion consistently outperforms single-modal baseline models. The practical significance of this approach for zero-day detection and false positive reduction, through feature alignment across modalities and explainability, is demonstrated. All limitations and operational aspects (data drift, adversarial robustness, LLM latency) of the proposed prototype are presented. We also outline areas for further research.
Executive Impact: Quantifiable Results
This research demonstrates significant advancements in phishing detection, offering superior accuracy, speed, and interpretability for enterprise security operations.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Enterprise Process Flow
| Multimodal Component | Model Designation | Model Type |
|---|---|---|
| URL + Metadata | M1 | CatBoost (Optuna-tuned) |
| URL (Character-Level) | M2 | CNN1D |
| HTML Code | M3 | CodeBERT Transformer (Fine-tuned) |
| Image (Screenshot) | M4 | EfficientNet-B7 (Fine-tuned) |
| Late Binding Fusion | M5 | Trainable Meta-Classifier (Weighted Voting) |
Modality Fusion Strategies
| Fusion Strategy | Advantages | Disadvantages |
|---|---|---|
| Early Fusion |
|
|
| Intermediate Fusion |
|
|
| Late Fusion |
|
|
| Hybrid Fusion |
|
|
End-to-End XAI Explanation Flow
| Multimodal Component | Model Designation | Model | Formed Features of XAI |
|---|---|---|---|
| URL + Metadata | M1 | CatBoost | SHAP Values |
| URL (Character-Level) | M2 | CNN1D | Grad-CAM, Integrated Gradients |
| HTML Code | M3 | CodeBERT | Attention Matrix |
| Image (Screenshot) | M4 | EfficientNet-B7 | Grad-CAM |
Enhanced Explainability Utility for SOC Analysts (with RAG)
0.0 Average Expert Utility Score 0.0 Inter-Rater Consistency (Cronbach's α) 0.0 Expert Agreement (Fleiss's κ)| Experiment | Models | Dataset | F1-Score |
|---|---|---|---|
| I | CatBoost (M1) + CodeBERT (M3) + Voting Classifier | D1 (own dataset) | 0.972 |
| II | CNN1D (M2) + EfficientNet-B7 (M4) + Voting Classifier | D2 (MTLP) | 0.944 |
| III | All Models (M1+M2+M3+M4) + Voting Classifier | D1 (own dataset) | 0.989 |
| III | All Models (M1+M2+M3+M4) + Voting Classifier | D2 (MTLP) | 0.953 |
| Enhanced Attack Subclass | Individual Models | Multimodal Model |
|---|---|---|
| URL obfuscation | 2% | 1.5% |
| HTML obfuscation | 4% | 2% |
| Visual camouflage | 5% | 3% |
Real-World Efficacy: Zero-Day Detection and Human-in-the-Loop Validation
The system demonstrated strong capabilities in detecting previously unknown, zero-day phishing links. In a test involving 100 such links, the system identified 92 as suspicious, including those not flagged by open tools like VirusTotal. This highlights the system's ability to recognize novel attack patterns through its comprehensive feature analysis across modalities.
Furthermore, the integrated Explainable AI (XAI) subsystem significantly improved SOC analyst decision-making. Expert reviews showed that explanations provided with RAG (Retrieval-Augmented Generation) achieved an average utility score of 4.20/5, with high inter-rater consistency (Fleiss's κ of 0.84 and Cronbach's α of 0.96). This human-in-the-loop validation underscores the practical value of explainability in critical cybersecurity operations, enabling analysts to quickly understand the rationale behind detections and refine the system's training data.
Calculate Your Potential AI ROI
Estimate the efficiency gains and cost savings your enterprise could achieve with advanced AI solutions.
Your AI Implementation Roadmap
A strategic phased approach ensures successful integration and maximum impact for your enterprise.
Phase 01: Discovery & Strategy
Comprehensive assessment of your current infrastructure, data, and business objectives. Define clear AI integration strategies and success metrics.
Phase 02: Prototype Development
Rapid prototyping of core AI modules, utilizing selected models and data modalities relevant to your specific use cases. Initial validation of performance.
Phase 03: System Integration
Seamless integration of the AI system within your existing SOC/TI/SIEM frameworks. Development of API connections and microservice deployment.
Phase 04: Validation & Optimization
Thorough testing against real-world and zero-day threats. Fine-tuning of models, meta-classifier, and XAI components for optimal accuracy and interpretability.
Phase 05: Continuous Improvement
Ongoing monitoring for data drift, regular retraining, and updates to ensure sustained high performance and adaptability to evolving threat landscapes.
Ready to Transform Your Cybersecurity with AI?
Book a personalized consultation with our AI specialists to discuss how these advanced multimodal and explainable AI technologies can be tailored to your enterprise needs.