AI Benchmarks & Evaluations Hub
Comprehensive database of AI benchmarks tracking performance across GPT-4, Claude, Gemini, and more
Benchmarks
0+
AI Models
50+
Categories
0
Updated
Daily
Benchmark Categories
Explore AI capabilities across different domains and task types
Knowledge
General knowledge, facts, and information retrieval across diverse domains
Reasoning
Logical thinking, problem-solving, and complex reasoning tasks
Coding
Programming challenges, code generation, and software engineering
Mathematics
Mathematical problem-solving from basic arithmetic to competition level
Multimodal
Visual understanding, image interpretation, and cross-modal reasoning
Agent & Tool Use
API usage, tool manipulation, and autonomous task completion
Long Context
Processing and reasoning over extended text sequences
Safety
Robustness, alignment, and ethical AI behavior evaluation
All Benchmarks
Explore 0 benchmarks across 0 categories
Frequently Asked Questions
Everything you need to know about AI benchmarks
Still have questions?
Contact Us