DO Challenge 2025 (DeepOrigin Autonomous Drug Discovery)
Benchmark for autonomous AI agents in drug discovery. Agents must identify top 1,000 molecules from 1M conformations with limited budget (100K score queries). Tests ML-based sampling, strategic resource management, and code execution for autonomous discovery pipelines.
Composite
70.9
Experimental validation
None
Stages
Hit ID
Modalities
ai_agentsmall-molecule
Task types
virtual_screeningactive_learningagent_evaluation
Size
molecular_conformations: 1,000,000
query_budget: 100,000
query_budget: 100,000
License
Apache-2.0
First release
2025-03
Last updated
2025-05
Official site
Leaderboard
→ leaderboard
Dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
Can AI Agents Design and Implement Drug Discovery Pipelines? · · 2025 · paper · doi:10.5281/zenodo.15296510 · 5 citations
Flags
competitionagent_benchmark
Experts
—
Groups
—
Hosted by
—
Related benchmarks
—
Rubric (7-criterion)
rigor
4
coverage
2
maintenance
3
adoption
3
quality
4
accessibility
4
industry_relevance
5
Notes
First benchmark specifically for AI agents (not just models) in drug discovery. Multi-agent system 'Deep Thought' outperformed most human teams but underperformed expert solutions. Tests integrated pipeline design rather than isolated tasks.