USPTO-50K / USPTO-MIT (Retrosynthesis)

Reactions extracted from USPTO patents; standard retrosynthesis/forward-reaction benchmark.

Composite
78.0
Experimental validation
Retrospective
Stages
Lead ID / ADMETDevelopmental Candidate
Modalities
small-molecule
Task types
retrosynthesisreaction-prediction
Size
reactions: 1,800,000
canonical_50k: 50,037
License
Public
First release
2017
Last updated
2023
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
Neural Sequence-to-Sequence Models for Retrosynthesis Prediction · Liu B, Ramsundar B, Kawthekar P, et al. · 2017 · paper · doi:10.1021/acscentsci.7b00303 · 520 citations
Flags
data-leakage-known
Experts
Bharath Ramsundar, Connor Coley
Groups
MIT CSAIL / Jameel Clinic / Coley Lab, Coley Lab (MIT)
Hosted by
Therapeutics Data Commons (TDC), Papers With Code — Drug Discovery
Related benchmarks

Rubric (7-criterion)

rigor
4
coverage
4
maintenance
2
adoption
5
quality
3
accessibility
5
industry_relevance
4

Notes

Known leakage across canonical splits; use time-split or ORD for fairer eval.

← Back to all benchmarks

Compare:
Open comparison →