LSD Large-Scale Docking Database
Open-source dataset of 6.3 billion explicitly evaluated ligand-target docking pairs across 11 protein targets. Provides docking scores, SMILES, poses for top molecules, and in vitro validation results. Designed for ML model development and chemical space exploration.
Composite
82.5
Experimental validation
Wet-lab confirmed
Stages
Hit ID
Modalities
protein_structuresmall-molecule
Task types
virtual_screeningdockingscoring
Size
ligand-target_pairs: 6,300,000,000
targets: 11
targets: 11
License
CC-BY-4.0
First release
2025-02
Last updated
2025-04
Official site
Leaderboard
→ leaderboard
Dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
A database for large-scale docking and experimental results · · 2025 · paper · doi:10.1021/acs.jcim.5c00394 · 8 citations
Flags
ultra_large_scaleexperimental_validation
Experts
—
Groups
—
Hosted by
—
Related benchmarks
Rubric (7-criterion)
rigor
4
coverage
5
maintenance
3
adoption
3
quality
4
accessibility
5
industry_relevance
5
Notes
Unprecedented scale for public docking data. Includes experimental in vitro validation for subset. From UCSF Shoichet Lab. Critical for training ML scoring functions and active learning in virtual screening.