T-Bench Leaderboard
Rank | Agent | Model | Organization | Date | Resolved Tasks | Accuracy |
---|---|---|---|---|---|---|
1 | T-Agent | o3-mini | Stanford | 2025-04-18 | 13/62 | 20.97% |
2 | Claude Code* | N/A | Anthropic | 2025-04-18 | 12/62 | 19.35% |
Please email mikeam@cs.stanford.edu or alex@laude.org to add your results to the leaderboard.