T-Bench

###################################################################################### # ______________ # # _________ ______ _______ _ _______ || || # # \__ __/ ( ___ \ ( ____ $ ( /|( ____ \|\ /| || > || # # ) ( | ( ) )| ( \/| \ ( || ( \/| ) ( | || || # # | | _____ | (__/ / | (__ | \ | || | | (___) | ||____________|| # # | |(_____)| __ ( | __) | (\ $ || | | ___ | |______________| # # | | | ( \ \ | ( | | \ || | | ( ) | \\############\\ # # | | | )___) )| (____/\| ) \ || (____/\| ) ( | \\############\\ # # )_( |/ \___/ (_______/|/ )_)(_______/|/ \| \ ____ \ # # \_____\___\____\ # # # ###################################################################################### $ cat what-is-t-bench.txt Hello, world! T-Bench is a benchmark for evaluating how well AI agents can accomplish complex tasks in a terminal environment. A Stanford x Laude collaboration $ cat links.txt https://github.com/laude-institute/t-bench https://discord.gg/6xWPKhGDbA $ cat actions.txt I want to contribute I want to run an agent