ClawBio Skill Correctness Bench
Third-party (Biostochastics LLC) benchmark of bio-analysis skills on safety / correctness / honesty. 10 skills × 182 tests.
Composite
74.2
Experimental validation
Retrospective
Stages
Disease ModelingTarget IDClinical Development
Modalities
cross-modality
Task types
correctness-auditsafety-audit
Size
skills: 10
tests: 182
pass_rate_pct: 92.3
tests: 182
pass_rate_pct: 92.3
License
MIT
First release
2026-04
Last updated
2026-05-03
Official site
Leaderboard
Dataset
→ dataset
Code / GitHub
HuggingFace
→ HF
Paper
Flags
none
Experts
—
Groups
Hosted by
Related benchmarks
—
Rubric (7-criterion)
rigor
5
coverage
2
maintenance
5
adoption
2
quality
4
accessibility
5
industry_relevance
3
Notes
Independent third-party bench structurally precludes self-reference. Coverage narrow but rigor exemplary.