Matching the algorithm to the model and task has huge performance gains.
A novel benchmark for holistic and realistic assessment of LLM agents in diverse professional settings. We explore eight of the nineteen expert-validated tasks covering three scenarios (sales, customer service, and configure, price, and quote) and four skills (workflow routing, policy compliance, information retrieval & textual reasoning, and database querying & numerical computation).
This leaderboard aggregates results from the TTC Benchmark consistency experiments across multiple runs. The aggregation strategy depends on the evaluation method:
Accuracy measures the percentage of benchmark instances that were successfully resolved by each model, providing a comprehensive view of practical performance.
Data includes results across various CRM Arena tasks including lead qualification, case routing, activity priority, and more.
We'd love to hear your thoughts on the TTC Benchmark Leaderboard.