Scientific Reports Article in Press
Advancing target discovery through disease-specific integration of multi-modal target identification models and comprehensive benchmarking system
We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.
By Howell Leung, Chengchen Duan, Wenhao Gou, Jianjiu Chen, Ying Xin, Zetian Zheng, Vladimir Naumov, David Gennert, Man Zhang, Alex Aliper, Feng Ren, Evgeny Izumchenko, Frank W. Pun & Alex Zhavoronkov
DOI: 10.1038/s41598-026-47765-3
Executive Impact: AI in Drug Discovery
Leveraging advanced AI models like TargetPro and comprehensive benchmarking systems is critical for revolutionizing drug discovery, significantly improving efficiency and success rates.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
Challenges in Drug Discovery
Drug development faces significant hurdles, with up to 90% of candidates failing clinical trials. This high attrition rate is costly, with a single new drug potentially reaching several billion US dollars in development. A primary cause for failure is the early selection of biological targets that later prove ineffective or toxic, underscoring the critical need for more robust target identification and de-risking.
Integrated AI Framework for Target Discovery
This study introduces a novel framework comprising two main components: Target Identification Pro (TargetPro) and Target Identification Benchmark (TargetBench 1.0). TargetPro is a disease-specific machine learning model trained across 38 diseases, leveraging multi-modal omics and text data to predict targets with high clinical advancement probability. TargetBench 1.0 provides a systematic evaluation system for target discovery models, including LLMs, assessing their ability to recover established targets and identify high-quality novel candidates.
Revolutionizing Drug Development Efficiency
The integrated TargetPro and TargetBench framework offers a streamlined approach to evaluate target discovery models, significantly enhancing drug development efficiency. By prioritizing targets with high translational potential and rigorously benchmarking predictive models, it aims to reduce failure rates, accelerate timelines, and build confidence in AI-driven therapeutic strategies. This leads to better resource allocation and faster delivery of novel therapies to patients.
AI-Driven Multi-Omics Integration
The framework harnesses AI-driven approaches, including machine learning models like XGBoost, CatBoost, Elastic Net, LightGBM, and Random Forest, to integrate multi-omics data (genomics, proteomics) and text data (scientific literature, grants). It also evaluates Large Language Models (LLMs) such as BioGPT, ChatGPT, Grok, Claude, and DeepSeek. Cost-Sensitive Learning and SHAP framework are used for robust training and interpretability.
TargetPro Model Workflow
TargetPro achieved an AUPRC (Area Under the Precision-Recall Curve) of approximately 0.295, significantly outperforming individual omics and text scores (0.015-0.22) and the baseline AUPRC from a random guess (0.01). This demonstrates superior predictive power for clinical-stage target identification.
Disease-Specific Feature Importance
Analysis of feature importance across five disease groups revealed that while matrix factorization and attention score were universally impactful, their relative importance varied by disease context. For instance, matrix factorization led in oncology (22.84%) and fibrotic (17.34%), whereas attention score was highest in immune (15.73%), metabolic (17.08%), and neurological (16.07%) models. Aggregated, text-derived features were highly influential in immune, metabolic, and neurological models, while oncology and fibrotic models showed a more balanced reliance on text and omics data.
TargetPro achieved an impressive overall precision at top K of 71.6%, representing a 1.7-5.5 fold improvement over leading LLMs (13.1-42.3%) and significantly outperforming Open Targets (under 20%). This consistent high performance extends across diverse therapeutic areas including oncology, metabolic, immune, fibrotic, and neurological diseases.
Novel Target Quality: TargetPro vs. LLMs
| Feature | TargetPro Performance | LLM Performance |
|---|---|---|
| Available 3D Structure | 95.7% of targets have 3D structure | 60.3-91.3% |
| Druggable Targets (Clinical Evidence) | 86.5% | 38.8-75.0% |
| Repurposing Potential (Approved Drugs in Other Indications) | 46% | 17.0-27.5% |
| Biological Relevance (Overlapping Pathways) | Average 108 pathways | Lower than TargetPro |
| Available Bioassays | Averaging over 500 | More than 1.4-fold lower than TargetPro |
| Available Gene Modulators | Average 13.8 modulators | 6.1-9.7 modulators |
Ethical AI Integration in Scientific Publishing
Challenge: Ensuring precision and clarity in scientific manuscripts while leveraging advanced tools without compromising scientific integrity.
Solution: AI tools were employed solely for proofreading, improving grammatical errors, sentence structure, and enhancing overall clarity and readability. This approach ensured that human oversight and scientific accuracy remained paramount, with authors retaining full responsibility for content.
Outcome: The ethical and limited integration of AI led to improved manuscript quality and readability, demonstrating a responsible framework for incorporating AI into scientific publishing processes.
Advanced ROI Calculator
Estimate the potential return on investment for integrating AI-powered target discovery in your organization.
Your AI Implementation Roadmap
A phased approach to integrating advanced AI into your drug discovery pipeline for maximum impact and minimal disruption.
Phase 01: Strategic Assessment & Data Integration
Conduct a thorough assessment of your existing drug discovery workflows and data infrastructure. Identify key areas where AI can provide the most value. Begin integration of proprietary multi-omics and text data sources with TargetPro.
Phase 02: Model Customization & Benchmarking
Fine-tune TargetPro models for your specific therapeutic areas and disease indications. Utilize TargetBench 1.0 to rigorously evaluate model performance against established benchmarks and identify novel, high-quality candidates.
Phase 03: Experimental Validation & Iteration
Prioritize and validate top AI-predicted targets through in vitro and in vivo experiments. Implement a continuous feedback loop where experimental results inform model retraining, leading to an adaptive and increasingly accurate target discovery system.
Phase 04: Full-Scale Deployment & Strategic Impact
Integrate the validated AI framework into your core R&D pipeline, optimizing resource allocation and accelerating drug candidate progression. Leverage AI insights for strategic decision-making and to drive significant improvements in overall drug development efficiency.
Ready to Revolutionize Your Drug Discovery?
Our experts are ready to show you how TargetPro and TargetBench 1.0 can transform your R&D pipeline. Schedule a personalized consultation to discuss your specific needs and unlock the full potential of AI in target identification.