SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling

Revolutionizing Computational Reaction Modeling with SynRXN

Computer-aided synthesis planning (CASP) lacks robust, comparable benchmarks across its full pipeline. Existing reaction informatics are fragmented by inconsistent preprocessing and opaque splitting strategies, making cross-paper comparisons difficult. SynRXN addresses this by providing a unified, FAIR (Findable, Accessible, Interoperable, and Reusable) benchmarking data resource. It decomposes CASP into five task families: reaction rebalancing, atom-to-atom mapping, reaction classification, reaction property prediction, and synthesis prediction. SynRXN offers curated, provenance-tracked datasets, predefined leakage-aware partitions, and standardized evaluation metrics, enabling fair longitudinal comparison and rigorous stress tests for real-world synthesis planning.

Schedule Your Strategy Session

Key Impact Metrics

SynRXN provides a unified, FAIR benchmarking data resource, driving significant advancements in CASP research:

0 Curated Datasets

0 Processed Reactions

0 Bitwise Reproducibility

Yes Fair Comparison Enabled

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Reaction Rebalancing

Atom-to-atom Mapping

Reaction Classification

Reaction Property Prediction

Synthesis Prediction

Reaction Rebalancing

Chemical reaction records mined from patent literature frequently lack stoichiometric fidelity, often omitting necessary inorganic reagents, solvents, or byproducts. Restoring mass balance in these records is critical for downstream modeling, as missing components can yield under-specified transformations and reduce the chemical executability of extracted retrosynthesis templates. SynRXN provides a robust benchmark to quantify correction accuracy and assess residual inconsistencies.

Key Datasets for Rebalancing

Dataset	Size	Reference
MNC	33147	35
MOS	12781	35
MBS	491	35
Complex	1748	16,22

SynRBL, a hybrid rule- and graph-based algorithm, achieved ≥90% confidence in resolving stoichiometric corrections for the curated test sets.

Atom-to-atom Mapping

Atom-to-atom mapping (AAM) establishes the structural lineage that reveals microscopic changes defining each transformation. Accurate AAM is essential for identifying reaction centers, extracting mechanistic templates, and supervising models that reason about bond changes. SynRXN stratifies the reaction corpus into two distinct domains: synthetic chemical reactions and biochemical transformations for comprehensive benchmarking.

Key Datasets for AAM

Dataset	Type	Size	Reference
Golden	Chem	1785	19,22
NatComm	Chem	491	-
USPTO_3K	Chem	3000	-
Recon3D	Bio	382	45
EColi	Bio	273	46

Mapping Accuracy Benchmark (Table 5)

Mapper	EColi (%)	Recon3D (%)	USPTO_3K (%)	Golden (%)	NatComm (%)
RXNMapper 0.4.1	72.53	48.69	93.53	87.43	87.58
Graphormer*	42.12	34.82	95.10	89.59	92.87
LocalMapper 0.1.5	69.96	50.79	97.77	89.08	92.67
RDTool 2.4.1	78.02	54.97	90.87	82.54	84.11

Reaction Classification

Reaction classification maps raw reaction inputs to predefined classes based on their structural or functional signatures. SynRXN provides a benchmark suite spanning multiple levels of granularity, from fine-grained SMARTS templates to high-level hierarchical ontologies. It includes datasets from USPTO patents and biochemical transformations (ECREACT).

Key Datasets for Classification

Dataset	Size	Classes	Complete	Reference
Schneider_U	50000	50	No	24
USPTO_TPL_B	445115	1000	Yes	26,50
SynTemp_R2	43441	680	Yes	23,35
ECREACT_3rd	185734	175	No	51

Classification Performance (Weighted F1, Table 6 Excerpt)

Dataset	DRFP F1	RXNFP F1
Schneider_U	0.968 ±0.002	0.962 ±0.002
USPTO_50K_B	0.966 ±0.002	0.952 ±0.002
SynTemp 2	0.913 ±0.003	0.737 ±0.004
ECREACT 1	0.977 ±0.001	0.905 ±0.001

Reaction Property Prediction

This task targets the quantification of continuous chemical attributes, such as yields, activation barriers, and transition-state features. SynRXN aggregates data from public repositories and the literature, encompassing ab initio kinetics datasets, specific mechanistic classes, and high-throughput experimental results.

Key Datasets for Property Prediction

Dataset	Size	Prop.	AAM	H	Complete	Reference
B97XD3	16365	dh, ea	Yes	Yes	No	58,59
Rad6Re	31923	dh	Yes	Yes	Yes	31,61
RGD1	353984	ea	Yes	Yes	Yes	52
LogRate	778	lograte	Yes	Yes	Yes	31,62

Property Prediction Performance (MAE, Table 7 Excerpt)

Dataset	Prop	DRFP MAE	RXNFP MAE
B97XD3	ea	14.617 ±0.268	15.324 ±0.239
RDB7	ea	30.136 ±0.210	18.812 ±0.240
SNAr	ea	1.402 ±0.158	1.447 ±0.139
RGD1	ea	16.704 ±0.074	15.953 ±0.032

Synthesis Prediction

The synthesis prediction task consolidates essential benchmarks for algorithmic single-step reaction prediction, combining forward and retrosynthesis. SynRXN provides standardized, deterministic splits to resolve prevalent issues with benchmark comparability, focusing on conventional top-k accuracy alongside structural similarity metrics.

Key Datasets for Synthesis Prediction

Dataset	Size	AAM	Task	Reference
USPTO_50K	50016	Yes	forward / backward	19,35
USPTO_MIT	479035	Yes	forward / backward	54
USPTO_500	143535	No	reagent prediction	43

SynRXN Technical Validation Workflow

Raw Input

→

Canonicalization & Sanity Checks

→

Deduplication & Filtering

→

Manifest Record Creation

→

Standardized Dataset

Calculate Your Potential ROI with AI

Estimate the efficiency gains and cost savings your enterprise could achieve by adopting advanced AI solutions, tailored to your specific operational context.

Your Industry

Number of Employees Impacted

Avg. Weekly Hours Saved per Employee (Estimate)

Average Hourly Cost per Employee ($)

Estimated Annual Savings $0

Annual Hours Reclaimed 0

Calculate Your AI ROI

Implementation Roadmap

Our phased approach ensures a robust, reproducible, and transparent integration of SynRXN's capabilities into your research or development pipeline.

Input Data Retrieval & Harmonization

Raw reaction data is retrieved from diverse public repositories and converted into a unified reaction table schema, ensuring consistency across all sources.

Molecular Standardization & Curation

A deterministic pipeline applies molecular standardization, record-level validity checks, canonicalization to stable reaction identifiers, and deduplication to ensure structural integrity and chemical executability.

Task-Specific Dataset Generation

Corpora are processed for specific tasks, including stoichiometric rebalancing, atom-to-atom mapping, reaction classification, property prediction, and synthesis prediction, addressing task-specific data requirements.

Benchmark Specification & Partitioning

Predefined, leakage-aware train/validation/test splits are generated deterministically, and standardized evaluation metrics are tailored for classification, regression, and structured prediction settings.

Release & Ongoing Support

The entire resource is released under permissive open licenses via Zenodo and GitHub, with scripted build recipes enabling bitwise-reproducible regeneration and supporting reuse and extension.

Discuss Your Implementation

Ready to Transform Your Chemical Synthesis Research?

Unlock the full potential of computational reaction modeling with a standardized, reproducible, and fair benchmarking framework. SynRXN is designed to accelerate innovation and ensure the robustness of your AI-driven synthesis planning.

Book a Consultation

SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling

Revolutionizing Computational Reaction Modeling with SynRXN

Key Impact Metrics

Deep Analysis & Enterprise Applications

Reaction Rebalancing

Key Datasets for Rebalancing

Atom-to-atom Mapping

Key Datasets for AAM

Mapping Accuracy Benchmark (Table 5)

Reaction Classification

Key Datasets for Classification

Classification Performance (Weighted F1, Table 6 Excerpt)

Reaction Property Prediction

Key Datasets for Property Prediction

Property Prediction Performance (MAE, Table 7 Excerpt)

Synthesis Prediction

Key Datasets for Synthesis Prediction

SynRXN Technical Validation Workflow

Calculate Your Potential ROI with AI

Implementation Roadmap

Input Data Retrieval & Harmonization

Molecular Standardization & Curation

Task-Specific Dataset Generation

Benchmark Specification & Partitioning

Release & Ongoing Support

Ready to Transform Your Chemical Synthesis Research?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Jobs

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai