Skip to main content
Enterprise AI Analysis: Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings

Enterprise AI Analysis

Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings

This research introduces RealMat-BaG, a new benchmark for assessing the reliability of machine learning models in predicting semiconductor bandgaps under realistic experimental conditions. It curates an open-access dataset of experimental bandgaps with aligned crystal structures, enabling direct comparison of GNNs and classical ML baselines. The framework evaluates performance across diverse data splits, analyzes transferability from DFT to experimental data, and examines model interpretability at elemental and structural levels. The findings highlight limitations of current models and establish a benchmark for more reliable materials discovery.

Executive Impact & Key Findings

The study reveals critical insights into the real-world applicability and limitations of current AI models in materials science. Addressing the data fidelity gap and enhancing OOD generalization are paramount for industry adoption.

0 Best Random Split MRAE
0 Min. Experimental Data for Pretraining Benefit
0 Increased OOD Error vs. Random Split
0 DFT Underprediction vs. Experiment

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Bridging the Data Fidelity Gap

Computational data, primarily from DFT, often serves as the training bedrock for ML models predicting material properties. However, a significant fidelity gap exists when these models are applied to real-world experimental measurements. DFT systematically underpredicts bandgaps compared to experimental values, leading to models that struggle with generalization to higher-fidelity experimental data. This gap is further widened by differences in dataset distributions: computational datasets typically span a broader chemical and structural space, while experimental data is often concentrated in areas of high research interest, such as chalcogenides. This section examines how this discrepancy impacts model performance and the role of experimental data in improving robustness.

2.0-2.5 eV Typical experimental bandgap range, highlighting data concentration.

Addressing Data Fidelity

Curate Experimental Data
Align Crystal Structures
Pretrain on Computational Data
Fine-tune with Experimental Data

OOD Generalization Across Material Domains

Machine learning models often perform well on random train-test splits but struggle when applied to materials outside their training distribution (Out-of-Distribution, OOD). This study introduces domain-based splits defined by chemical composition, material categories, or crystal structure to simulate real-world discovery scenarios more accurately. The performance under these OOD conditions reveals the robustness of models to different types of distribution shifts, highlighting where current models fall short in generalizing to unseen material spaces. Pretraining on large computational datasets can improve robustness but may also propagate systematic biases if not carefully managed.

OOD Generalization Performance Comparison
Split Type Difficulty ML Performance Pretraining Effect
Random Split Low Highest - Interpolation within training distribution. Significant benefit with limited experimental data.
Chemical System Medium-Low Moderate degradation, but better than other OOD splits. Moderate benefit, can propagate DFT biases.
Periodic Group High Significant degradation - Challenging due to broader chemical shifts. Limited or negative benefit, high bias risk.
Material Category (LOMO) Highest Most challenging - Silicides and Antimonides particularly difficult. Variable, can sometimes increase error due to bias.
0.71 MRAE Highest average MRAE on LOMO for CGCNN (pretrained).

Model Interpretability: GNN Saliency & SHAP Values

Beyond predictive accuracy, understanding *why* a model makes certain predictions is crucial for building trust and guiding materials discovery. This section explores model interpretability at two levels: structural interpretations for Graph Neural Networks (GNNs) using gradient saliency maps, and elemental property attributions for classical machine learning models using SHAP (SHapley Additive exPlanations). By revealing which atoms, bonds, or elemental properties drive predictions, we can assess whether models align with established physical intuition and identify potential spurious correlations.

Case Study: Interpretability for SnGeS₃

For SnGeS₃ (mp-5045), GNN saliency maps consistently highlight chalcogen-metal bonds (S-Sn and S-Ge) and their associated atomic sites. This aligns with known electronic structure principles where band-edge states are often dominated by chalcogen p orbitals hybridized with metal s/p states. This indicates the model is capturing physically meaningful interactions.

In contrast, atoms primarily serving structural or charge-balancing roles exhibit lower saliency scores, further reinforcing the model's physical relevance in its predictive mechanisms.

D-Block Strongest negative SHAP impact, correlating with reduced bandgaps due to d-orbital involvement.

Interpreting SVR Elemental Features via SHAP

SHAP analysis on Support Vector Regression (SVR) models, which utilize aggregated elemental properties, reveals consistent patterns. For instance, a higher proportion of d-block elements is associated with reduced bandgaps, reflecting the direct involvement of d-orbitals in band-edge states. Similarly, increasing the proportion of higher-period (heavier) elements correlates with smaller bandgaps due to more diffuse orbitals. While some patterns may reflect dataset composition biases (e.g., specific FIE or electronegativity ranges linking to chalcogenides), the overall consistency with chemical intuition supports the reliability of these models.

Advanced ROI Calculator

Estimate the potential return on investment for integrating advanced AI into your materials discovery pipeline.

Estimated Annual Savings $0
Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A structured approach to integrating AI for reliable bandgap prediction and materials discovery.

Phase 1: Data Curation & Benchmark Alignment

Initial assessment of existing computational and experimental datasets. Curate high-quality experimental bandgap data with aligned crystal structures and integrate into the RealMat-BaG framework. Establish baseline model performance.

Phase 2: Model Evaluation & Selection

Evaluate GNNs and classical ML models across various splits (random, feature-based OOD, chemistry-based OOD, structure-based OOD). Analyze transferability from DFT pretraining to experimental data. Identify top-performing models for specific material domains.

Phase 3: Interpretability & Validation

Conduct node/edge-level gradient saliency for GNNs and SHAP for classical models to ensure alignment with physical intuition. Validate model decisions against known materials science principles, building trust and transparency.

Phase 4: Deployment & Continuous Improvement

Deploy the validated AI models into discovery workflows. Establish continuous learning loops to incorporate new experimental data, monitor model generalization, and refine strategies for long-term reliable materials discovery.

Ready to Revolutionize Materials Discovery?

Let's discuss how RealMat-BaG insights and tailored AI solutions can accelerate your R&D and bring new materials to market faster.

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs


AI Consultation Booking