BELKA (Big Encoded Library for Chemical Assessment)

Largest public DNA-encoded library (DEL) dataset: ~133M small molecules with 3.6B binding measurements against BRD4, sEH, and HSA. NeurIPS 2024 Kaggle competition. Includes library split for OOD evaluation. From Leash Biosciences.

Composite
82.5
Experimental validation
Retrospective
Stages
Hit ID
Modalities
dna_encoded_librarysmall-molecule
Task types
binding_predictionvirtual_screening
Size
molecules: 133,000,000
measurements: 3,600,000,000
targets: 3
License
CC-BY-4.0
First release
2024-04
Last updated
2024-10
Official site
→ project page
Leaderboard
→ leaderboard
Dataset
→ dataset
Code / GitHub
→ repository
HuggingFace
→ HF
Paper
Introducing BELKA: Big Encoded Library for Chemical Assessment · · 2024 · 25 citations
Flags
neurips_2024kaggleultra_large_scalecompetition
Experts
Groups
Hosted by
Related benchmarks
LIT-PCBA, DUD-E, LSD Large-Scale Docking Database

Rubric (7-criterion)

rigor
4
coverage
3
maintenance
3
adoption
5
quality
4
accessibility
5
industry_relevance
5

Notes

NeurIPS 2024 competition. Unprecedented scale for public binding data. Library split tests true OOD generalization. DEL technology enables massive chemical space exploration. Now on Polaris Hub.

← Back to all benchmarks

Compare:
Open comparison →