Scientific Reports Article in Press

Advancing target discovery through disease-specific integration of multi-modal target identification models and comprehensive benchmarking system

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

By Howell Leung, Chengchen Duan, Wenhao Gou, Jianjiu Chen, Ying Xin, Zetian Zheng, Vladimir Naumov, David Gennert, Man Zhang, Alex Aliper, Feng Ren, Evgeny Izumchenko, Frank W. Pun & Alex Zhavoronkov

DOI: 10.1038/s41598-026-47765-3

Schedule Your Strategy Session

Executive Impact: AI in Drug Discovery

Leveraging advanced AI models like TargetPro and comprehensive benchmarking systems is critical for revolutionizing drug discovery, significantly improving efficiency and success rates.

0% Drug Candidates Fail Clinical Trials

0x AstraZeneca Success Rate Improvement (4% to 19%)

0% TargetPro Precision (Top K)

0% TargetPro Identified Druggable Targets

Discuss Your AI Strategy

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Challenges in Drug Discovery

Drug development faces significant hurdles, with up to 90% of candidates failing clinical trials. This high attrition rate is costly, with a single new drug potentially reaching several billion US dollars in development. A primary cause for failure is the early selection of biological targets that later prove ineffective or toxic, underscoring the critical need for more robust target identification and de-risking.

Integrated AI Framework for Target Discovery

This study introduces a novel framework comprising two main components: Target Identification Pro (TargetPro) and Target Identification Benchmark (TargetBench 1.0). TargetPro is a disease-specific machine learning model trained across 38 diseases, leveraging multi-modal omics and text data to predict targets with high clinical advancement probability. TargetBench 1.0 provides a systematic evaluation system for target discovery models, including LLMs, assessing their ability to recover established targets and identify high-quality novel candidates.

Revolutionizing Drug Development Efficiency

The integrated TargetPro and TargetBench framework offers a streamlined approach to evaluate target discovery models, significantly enhancing drug development efficiency. By prioritizing targets with high translational potential and rigorously benchmarking predictive models, it aims to reduce failure rates, accelerate timelines, and build confidence in AI-driven therapeutic strategies. This leads to better resource allocation and faster delivery of novel therapies to patients.

AI-Driven Multi-Omics Integration

The framework harnesses AI-driven approaches, including machine learning models like XGBoost, CatBoost, Elastic Net, LightGBM, and Random Forest, to integrate multi-omics data (genomics, proteomics) and text data (scientific literature, grants). It also evaluates Large Language Models (LLMs) such as BioGPT, ChatGPT, Grok, Claude, and DeepSeek. Cost-Sensitive Learning and SHAP framework are used for robust training and interpretability.

TargetPro Model Workflow

Create 5 stratified splits

→

Select the best hyperparameters to train a model

→

Predict testing data

→

Combine all prediction results and evaluate the generalisability of models

0.295 TargetPro AUPRC vs. Baselines

TargetPro achieved an AUPRC (Area Under the Precision-Recall Curve) of approximately 0.295, significantly outperforming individual omics and text scores (0.015-0.22) and the baseline AUPRC from a random guess (0.01). This demonstrates superior predictive power for clinical-stage target identification.

Disease-Specific Feature Importance

Analysis of feature importance across five disease groups revealed that while matrix factorization and attention score were universally impactful, their relative importance varied by disease context. For instance, matrix factorization led in oncology (22.84%) and fibrotic (17.34%), whereas attention score was highest in immune (15.73%), metabolic (17.08%), and neurological (16.07%) models. Aggregated, text-derived features were highly influential in immune, metabolic, and neurological models, while oncology and fibrotic models showed a more balanced reliance on text and omics data.

71.6% Overall Precision at Top K

TargetPro achieved an impressive overall precision at top K of 71.6%, representing a 1.7-5.5 fold improvement over leading LLMs (13.1-42.3%) and significantly outperforming Open Targets (under 20%). This consistent high performance extends across diverse therapeutic areas including oncology, metabolic, immune, fibrotic, and neurological diseases.

Novel Target Quality: TargetPro vs. LLMs

Feature	TargetPro Performance	LLM Performance
Available 3D Structure	95.7% of targets have 3D structure	60.3-91.3%
Druggable Targets (Clinical Evidence)	86.5%	38.8-75.0%
Repurposing Potential (Approved Drugs in Other Indications)	46%	17.0-27.5%
Biological Relevance (Overlapping Pathways)	Average 108 pathways	Lower than TargetPro
Available Bioassays	Averaging over 500	More than 1.4-fold lower than TargetPro
Available Gene Modulators	Average 13.8 modulators	6.1-9.7 modulators

Ethical AI Integration in Scientific Publishing

Challenge: Ensuring precision and clarity in scientific manuscripts while leveraging advanced tools without compromising scientific integrity.

Solution: AI tools were employed solely for proofreading, improving grammatical errors, sentence structure, and enhancing overall clarity and readability. This approach ensured that human oversight and scientific accuracy remained paramount, with authors retaining full responsibility for content.

Outcome: The ethical and limited integration of AI led to improved manuscript quality and readability, demonstrating a responsible framework for incorporating AI into scientific publishing processes.

Explore Detailed Analytics

Advanced ROI Calculator

Estimate the potential return on investment for integrating AI-powered target discovery in your organization.

Your Industry

Number of Employees Involved in R&D

Average Weekly Hours on Target ID / Research

Average Hourly Cost Per Employee ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Your AI Implementation Roadmap

A phased approach to integrating advanced AI into your drug discovery pipeline for maximum impact and minimal disruption.

Phase 01: Strategic Assessment & Data Integration

Conduct a thorough assessment of your existing drug discovery workflows and data infrastructure. Identify key areas where AI can provide the most value. Begin integration of proprietary multi-omics and text data sources with TargetPro.

Phase 02: Model Customization & Benchmarking

Fine-tune TargetPro models for your specific therapeutic areas and disease indications. Utilize TargetBench 1.0 to rigorously evaluate model performance against established benchmarks and identify novel, high-quality candidates.

Phase 03: Experimental Validation & Iteration

Prioritize and validate top AI-predicted targets through in vitro and in vivo experiments. Implement a continuous feedback loop where experimental results inform model retraining, leading to an adaptive and increasingly accurate target discovery system.

Phase 04: Full-Scale Deployment & Strategic Impact

Integrate the validated AI framework into your core R&D pipeline, optimizing resource allocation and accelerating drug candidate progression. Leverage AI insights for strategic decision-making and to drive significant improvements in overall drug development efficiency.

Book a Roadmap Discussion

Ready to Revolutionize Your Drug Discovery?

Our experts are ready to show you how TargetPro and TargetBench 1.0 can transform your R&D pipeline. Schedule a personalized consultation to discuss your specific needs and unlock the full potential of AI in target identification.

Schedule Your Consultation

Scientific Reports Article in Press

Advancing target discovery through disease-specific integration of multi-modal target identification models and comprehensive benchmarking system

Executive Impact: AI in Drug Discovery

Deep Analysis & Enterprise Applications

Challenges in Drug Discovery

Integrated AI Framework for Target Discovery

Revolutionizing Drug Development Efficiency

AI-Driven Multi-Omics Integration

TargetPro Model Workflow

Disease-Specific Feature Importance

Novel Target Quality: TargetPro vs. LLMs

Ethical AI Integration in Scientific Publishing

Advanced ROI Calculator

Your AI Implementation Roadmap

Phase 01: Strategic Assessment & Data Integration

Phase 02: Model Customization & Benchmarking

Phase 03: Experimental Validation & Iteration

Phase 04: Full-Scale Deployment & Strategic Impact

Ready to Revolutionize Your Drug Discovery?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai