Benchmark Detail View

Calendar System

EASY Challenge43 models testedTop Score: 91.3
Success Rate
73.9%
Quality Score
61
Tests Passed
17
Models Tested
43
Calendar System Benchmark - Individual Model Results
Showing 43 of 43 models
1
Claude 4 Opus
Claude08/2025
08/2025
91.395.7%
2
Gemini 2.0 Flash-001
Google08/2025
08/2025
88.691.3%
3
OpenAI GPT-4 Turbo
OpenAI08/2025
08/2025
87.091.3%
4
OpenAI GPT-4.1
OpenAI08/2025
08/2025
85.691.3%
5
Codestral 25.08
Mistral08/2025
08/2025
85.387.0%
6
Gemini 2.5 Flash
Google08/2025
08/2025
84.987.0%
7
Grok 4
xAI08/2025
08/2025
84.787.0%
8
Claude 3.5 Haiku
Claude08/2025
08/2025
84.587.0%
9
Gemini 2.5 Pro
Google08/2025
08/2025
84.587.0%
10
OpenAI o4-mini
OpenAI08/2025
08/2025
84.587.0%
11
OpenAI GPT-4.1 nano
OpenAI08/2025
08/2025
84.587.0%
12
Llama 4 Maverick
Meta08/2025
08/2025
83.987.0%
13
OpenAI o1-mini
OpenAI08/2025
08/2025
83.787.0%
14
OpenAI GPT-4
OpenAI08/2025
08/2025
81.582.6%
15
OpenAI o4-mini (High)
OpenAI08/2025
08/2025
80.882.6%
16
OpenAI GPT-4.1 mini
OpenAI08/2025
08/2025
80.582.6%
17
Coder Large
Other08/2025
08/2025
80.382.6%
18
Claude 3.5 Sonnet
Claude08/2025
08/2025
80.382.6%
19
OpenAI o3-mini
OpenAI08/2025
08/2025
80.282.6%
20
Claude 4 Sonnet
Claude08/2025
08/2025
79.582.6%
21
OpenAI o3-mini (High)
OpenAI08/2025
08/2025
79.382.6%
22
Claude 3.7 Sonnet (Thinking)
Claude08/2025
08/2025
78.882.6%
23
Gemini 2.5 Flash Lite
Google08/2025
08/2025
78.078.3%
24
OpenAI GPT-4o
OpenAI08/2025
08/2025
78.078.3%
25
OpenAI GPT-4o
OpenAI08/2025
08/2025
78.082.6%
26
Horizon Beta
Other08/2025
08/2025
76.678.3%
27
Grok 3 Mini
xAI08/2025
08/2025
75.878.3%
28
Nova Pro V1
Amazon08/2025
08/2025
73.773.9%
29
Llama 4 Scout
Meta08/2025
08/2025
73.573.9%
30
OpenAI GPT-4o mini
OpenAI08/2025
08/2025
73.373.9%
31
DeepSeek V3
DeepSeek08/2025
08/2025
73.173.9%
32
R1
DeepSeek08/2025
08/2025
72.173.9%
33
Grok 3
xAI08/2025
08/2025
65.165.2%
34
Kimi K2
Moonshot08/2025
08/2025
64.965.2%
35
Claude 3.7 Sonnet
Claude08/2025
08/2025
63.565.2%
36
Nova Lite V1
Amazon08/2025
08/2025
63.260.9%
37
Mistral Medium 3
Mistral08/2025
08/2025
61.460.9%
38
OpenAI GPT-3.5 Turbo
OpenAI08/2025
08/2025
59.756.5%
39
Qwen 3 Coder
Alibaba08/2025
08/2025
59.260.9%
40
Claude 3 Haiku
Claude08/2025
08/2025
56.052.2%
41
Nova Micro V1
Amazon08/2025
08/2025
22.617.4%
42
Gemma 3 4B IT
Google08/2025
08/2025
15.713.0%
43
Command A
Cohere08/2025
08/2025
7.20.0%

Top Performers

Calendar Challenge Champions
#1
Claude
91.3

Claude 4 Opus

Success Rate
95.7%
22
Tests Passed
Q
52
Quality
24
Issues
23 total tests
#2
Google
88.6

Gemini 2.0 Flash-001

Success Rate
91.3%
21
Tests Passed
Q
64
Quality
18
Issues
23 total tests
#3
OpenAI
87.0

OpenAI GPT-4 Turbo

Success Rate
91.3%
21
Tests Passed
Q
48
Quality
26
Issues
23 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.