T-Bench Leaderboard

*This agent was evaluated by installing it in the task container and fails some tasks due to installation failure rather than ability.

RankAgentModelOrganizationDateResolved TasksAccuracy
1T-Agento3-miniStanford2025-04-1813/6220.97%
2Claude Code*N/AAnthropic2025-04-1812/6219.35%

Please email mikeam@cs.stanford.edu or alex@laude.org to add your results to the leaderboard.