Benchmark Detail View

Calendar System

EASY Challenge94 models testedTop Score: 92.1
Success Rate
77.1%
Quality Score
58
Tests Passed
18
Models Tested
94
Calendar System Benchmark - Individual Model Results
Showing 94 of 94 models
1
OSS 120B
OpenAI08/2025
08/2025
92.195.7%
2
Claude 4 Opus
Anthropic08/2025
08/2025
91.395.7%
3
Claude 4.5 Haiku
Anthropic10/2025
10/2025
90.995.7%
4
Claude 4.1 Opus
Anthropic08/2025
08/2025
90.995.7%
5
Gemini 2.0 Flash 001
Google08/2025
08/2025
88.691.3%
6
GPT 5 nano
OpenAI08/2025
08/2025
87.291.3%
7
GPT 4 Turbo
OpenAI08/2025
08/2025
87.091.3%
8
GPT 5.1 Codex Mini
OpenAI11/2025
11/2025
86.787.0%
9
GPT 4.1
OpenAI08/2025
08/2025
85.691.3%
10
Claude 4.6 Sonnet
Anthropic02/2026
02/2026
85.387.0%
11
Codestral 25.08
Mistral08/2025
08/2025
85.387.0%
12
Gemini 2.5 Flash
Google08/2025
08/2025
84.987.0%
13
GLM 5
Z.AI02/2026
02/2026
84.987.0%
14
Grok 4
xAI08/2025
08/2025
84.787.0%
15
Claude 3.5 Haiku
Anthropic08/2025
08/2025
84.587.0%
16
Claude 4.5 Opus
Anthropic11/2025
11/2025
84.587.0%
17
Gemini 2.5 Pro
Google08/2025
08/2025
84.587.0%
18
GPT 4.1 nano
OpenAI08/2025
08/2025
84.587.0%
19
o4 mini
OpenAI08/2025
08/2025
84.587.0%
20
Claude 4.6 Opus
Anthropic02/2026
02/2026
84.387.0%
21
Kimi K2 Thinking
Moonshot AI12/2025
12/2025
84.387.0%
22
Llama 4 Maverick
Meta08/2025
08/2025
83.987.0%
23
o1 mini
OpenAI08/2025
08/2025
83.787.0%
24
GPT 5.3 Codex
OpenAI02/2026
02/2026
83.282.6%
25
GPT 5.4
OpenAI03/2026
03/2026
83.082.6%
26
MiniMax M2.5
Minimax02/2026
02/2026
82.987.0%
27
GPT 5.2
OpenAI12/2025
12/2025
82.882.6%
28
GPT 5 Codex
OpenAI10/2025
10/2025
82.882.6%
29
GPT 5 nano
OpenAI09/2025
09/2025
82.587.0%
30
GPT 5.1 Codex
OpenAI11/2025
11/2025
82.082.6%
31
GPT 5.2
OpenAI12/2025
12/2025
82.082.6%
32
GLM 4.7
Z.AI12/2025
12/2025
81.987.0%
33
GPT 4
OpenAI08/2025
08/2025
81.582.6%
34
GLM 4.6
Z.AI10/2025
10/2025
80.882.6%
35
o4 mini (High)
OpenAI08/2025
08/2025
80.882.6%
36
GPT 4.1 mini
OpenAI08/2025
08/2025
80.582.6%
37
Claude 3.5 Sonnet
Anthropic08/2025
08/2025
80.382.6%
38
Coder Large
Other08/2025
08/2025
80.382.6%
39
DeepSeek V3.2 Speciale
DeepSeek02/2026
02/2026
80.282.6%
40
Grok Code Fast 1
xAI09/2025
09/2025
80.282.6%
41
o3 mini
OpenAI08/2025
08/2025
80.282.6%
42
Claude 4 Sonnet
Anthropic08/2025
08/2025
79.582.6%
43
o3 mini (High)
OpenAI08/2025
08/2025
79.382.6%
44
Claude 3.7 Sonnet (Thinking)
Anthropic08/2025
08/2025
78.882.6%
45
GPT 5 mini
OpenAI08/2025
08/2025
78.582.6%
46
GPT 5.2 Codex
OpenAI01/2026
01/2026
78.382.6%
47
GPT 5.3 Chat
OpenAI03/2026
03/2026
78.282.6%
48
Gemini 2.5 Flash Lite
Google08/2025
08/2025
78.078.3%
49
GPT 4o
OpenAI08/2025
08/2025
78.078.3%
50
GPT 4o
OpenAI08/2025
08/2025
78.082.6%
51
GPT 5.1
OpenAI11/2025
11/2025
77.878.3%
52
Qwen3 Max
Qwen10/2025
10/2025
77.882.6%
53
Trinity Large Preview
Arcee AI02/2026
02/2026
76.878.3%
54
DeepSeek V3.2 Exp
DeepSeek10/2025
10/2025
76.678.3%
55
Horizon Beta
Other08/2025
08/2025
76.678.3%
56
GPT 5.1
OpenAI11/2025
11/2025
76.678.3%
57
GPT 5
OpenAI08/2025
08/2025
76.678.3%
58
GPT 5
OpenAI09/2025
09/2025
76.278.3%
59
Grok 3 Mini
xAI08/2025
08/2025
75.878.3%
60
Gemini 3 Flash Preview
Google12/2025
12/2025
75.278.3%
61
GPT 5
OpenAI08/2025
08/2025
74.678.3%
62
Kimi K2.5
Moonshot AI02/2026
02/2026
74.278.3%
63
Gemini 3.1 Pro Preview
Google02/2026
02/2026
73.773.9%
64
Nova Pro V1
Amazon08/2025
08/2025
73.773.9%
65
Llama 4 Scout
Meta08/2025
08/2025
73.573.9%
66
GPT 4o mini
OpenAI08/2025
08/2025
73.373.9%
67
DeepSeek V3
DeepSeek08/2025
08/2025
73.173.9%
68
Claude 4.5 Sonnet
Anthropic10/2025
10/2025
72.773.9%
69
DeepSeek V3.2 Exp
DeepSeek12/2025
12/2025
72.773.9%
70
Gemini 3 Pro Preview
Google11/2025
11/2025
72.573.9%
71
Grok 4 Fast
xAI10/2025
10/2025
72.478.3%
72
DeepSeek R1
DeepSeek08/2025
08/2025
72.173.9%
73
Step 3.5 Flash
StepFun02/2026
02/2026
71.878.3%
74
Sonoma Sky Alpha
Other09/2025
09/2025
71.678.3%
75
Qwen3 Coder Next
Qwen02/2026
02/2026
69.573.9%
76
Mistral Large 25.12
Mistral12/2025
12/2025
68.669.6%
77
Nova 2 Lite V1
Amazon02/2026
02/2026
68.669.6%
78
GPT 5 mini
OpenAI09/2025
09/2025
66.269.6%
79
Kimi K2 (0905)
Moonshot AI10/2025
10/2025
65.565.2%
80
MIMO V2 Flash
Minimax12/2025
12/2025
65.469.6%
81
Devstral 25.12
Mistral12/2025
12/2025
65.165.2%
82
Grok 3
xAI08/2025
08/2025
65.165.2%
83
Kimi K2
Moonshot AI08/2025
08/2025
64.965.2%
84
Claude 3.7 Sonnet
Anthropic08/2025
08/2025
63.565.2%
85
Qwen3 Coder Plus
Qwen10/2025
10/2025
63.565.2%
86
Nova Lite V1
Amazon08/2025
08/2025
63.260.9%
87
Mistral Medium 3
Mistral08/2025
08/2025
61.460.9%
88
GPT 3.5 Turbo
OpenAI08/2025
08/2025
59.756.5%
89
Qwen3 Coder
Qwen08/2025
08/2025
59.260.9%
90
Claude 3 Haiku
Anthropic08/2025
08/2025
56.052.2%
91
OSS 20B
OpenAI08/2025
08/2025
53.052.2%
92
Nova Micro V1
Amazon08/2025
08/2025
22.617.4%
93
Gemma 3 4B IT
Google08/2025
08/2025
15.713.0%
94
Command A
Cohere08/2025
08/2025
7.20.0%

Top Performers

Calendar Challenge Champions
#1
OpenAI
92.1

OSS 120B

Success Rate
95.7%
22
Tests Passed
Q
60
Quality
20
Issues
23 total tests
#2
Anthropic
91.3

Claude 4 Opus

Success Rate
95.7%
22
Tests Passed
Q
52
Quality
24
Issues
23 total tests
#3
Anthropic
90.9

Claude 4.5 Haiku

Success Rate
95.7%
22
Tests Passed
Q
48
Quality
26
Issues
23 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.