Benchmark Detail View

Calendar System

EASY Challenge80 models testedTop Score: 92.1
Success Rate
76.5%
Quality Score
58
Tests Passed
18
Models Tested
80
Calendar System Benchmark - Individual Model Results
Showing 80 of 80 models
1
Openai Oss 120b
OpenAI08/2025
08/2025
92.195.7%
2
Claude 4 Opus
Claude08/2025
08/2025
91.395.7%
3
Claude 4.1 Opus
Claude08/2025
08/2025
90.995.7%
4
Claude 4.5 Haiku
Claude10/2025
10/2025
90.995.7%
5
Gemini 2.0 Flash-001
Google08/2025
08/2025
88.691.3%
6
OpenAI GPT-5 nano
OpenAI08/2025
08/2025
87.291.3%
7
OpenAI GPT-4 Turbo
OpenAI08/2025
08/2025
87.091.3%
8
OpenAI 5.1 Codex Mini
OpenAI11/2025
11/2025
86.787.0%
9
OpenAI GPT-4.1
OpenAI08/2025
08/2025
85.691.3%
10
Codestral 25.08
Mistral08/2025
08/2025
85.387.0%
11
Gemini 2.5 Flash
Google08/2025
08/2025
84.987.0%
12
Grok 4
xAI08/2025
08/2025
84.787.0%
13
Claude 3.5 Haiku
Claude08/2025
08/2025
84.587.0%
14
OpenAI GPT-4.1 nano
OpenAI08/2025
08/2025
84.587.0%
15
Gemini 2.5 Pro
Google08/2025
08/2025
84.587.0%
16
OpenAI o4-mini
OpenAI08/2025
08/2025
84.587.0%
17
Claude 4.5 Opus
Claude11/2025
11/2025
84.587.0%
18
Kimi K2
Moonshot12/2025
12/2025
84.387.0%
19
Llama 4 Maverick
Meta08/2025
08/2025
83.987.0%
20
OpenAI o1-mini
OpenAI08/2025
08/2025
83.787.0%
21
OpenAI 5 Codex
OpenAI10/2025
10/2025
82.882.6%
22
OpenAI GPT-5.2
OpenAI12/2025
12/2025
82.882.6%
23
OpenAI GPT-5 nano
OpenAI09/2025
09/2025
82.587.0%
24
OpenAI 5.1 Codex
OpenAI11/2025
11/2025
82.082.6%
25
OpenAI GPT-5.2 Chat
OpenAI12/2025
12/2025
82.082.6%
26
Glm 4 7
Other12/2025
12/2025
81.987.0%
27
OpenAI GPT-4
OpenAI08/2025
08/2025
81.582.6%
28
OpenAI o4-mini (High)
OpenAI08/2025
08/2025
80.882.6%
29
Glm 4 6
Other10/2025
10/2025
80.882.6%
30
OpenAI GPT-4.1 mini
OpenAI08/2025
08/2025
80.582.6%
31
Claude 3.5 Sonnet
Claude08/2025
08/2025
80.382.6%
32
Coder Large
Other08/2025
08/2025
80.382.6%
33
OpenAI o3-mini
OpenAI08/2025
08/2025
80.282.6%
34
Grok Code Fast 1
xAI09/2025
09/2025
80.282.6%
35
Claude 4 Sonnet
Claude08/2025
08/2025
79.582.6%
36
OpenAI o3-mini (High)
OpenAI08/2025
08/2025
79.382.6%
37
Claude 3.7 Sonnet (Thinking)
Claude08/2025
08/2025
78.882.6%
38
OpenAI GPT-5 mini
OpenAI08/2025
08/2025
78.582.6%
39
OpenAI 5.2 Codex
OpenAI01/2026
01/2026
78.382.6%
40
Gemini 2.5 Flash Lite
Google08/2025
08/2025
78.078.3%
41
OpenAI GPT-4o
OpenAI08/2025
08/2025
78.078.3%
42
OpenAI GPT-4o
OpenAI08/2025
08/2025
78.082.6%
43
OpenAI GPT-5.1
OpenAI11/2025
11/2025
77.878.3%
44
Qwen3 Max
Alibaba10/2025
10/2025
77.882.6%
45
OpenAI GPT-5.1 Chat
OpenAI11/2025
11/2025
76.678.3%
46
OpenAI GPT-5 Chat
OpenAI08/2025
08/2025
76.678.3%
47
Horizon Beta
Other08/2025
08/2025
76.678.3%
48
DeepSeek V3
DeepSeek10/2025
10/2025
76.678.3%
49
OpenAI GPT-5
OpenAI09/2025
09/2025
76.278.3%
50
Grok 3 Mini
xAI08/2025
08/2025
75.878.3%
51
Gemini 3 Flash
Google12/2025
12/2025
75.278.3%
52
OpenAI GPT-5
OpenAI08/2025
08/2025
74.678.3%
53
Nova Pro V1
Amazon08/2025
08/2025
73.773.9%
54
Llama 4 Scout
Meta08/2025
08/2025
73.573.9%
55
OpenAI GPT-4o mini
OpenAI08/2025
08/2025
73.373.9%
56
DeepSeek V3
DeepSeek08/2025
08/2025
73.173.9%
57
Claude 4.5 Sonnet
Claude10/2025
10/2025
72.773.9%
58
DeepSeek V3
DeepSeek12/2025
12/2025
72.773.9%
59
Gemini 3 Pro Preview
Google11/2025
11/2025
72.573.9%
60
Grok 4
xAI10/2025
10/2025
72.478.3%
61
R1
DeepSeek08/2025
08/2025
72.173.9%
62
Sonoma Sky Alpha
Other09/2025
09/2025
71.678.3%
63
Mistral Large 2512
Mistral12/2025
12/2025
68.669.6%
64
OpenAI GPT-5 mini
OpenAI09/2025
09/2025
66.269.6%
65
Kimi K2
Moonshot10/2025
10/2025
65.565.2%
66
Mimo V2 Flash Free
Other12/2025
12/2025
65.469.6%
67
Devstral 2512
Other12/2025
12/2025
65.165.2%
68
Grok 3
xAI08/2025
08/2025
65.165.2%
69
Kimi K2
Moonshot08/2025
08/2025
64.965.2%
70
Qwen 3 Coder
Alibaba10/2025
10/2025
63.565.2%
71
Claude 3.7 Sonnet
Claude08/2025
08/2025
63.565.2%
72
Nova Lite V1
Amazon08/2025
08/2025
63.260.9%
73
Mistral Medium 3
Mistral08/2025
08/2025
61.460.9%
74
OpenAI GPT-3.5 Turbo
OpenAI08/2025
08/2025
59.756.5%
75
Qwen 3 Coder
Alibaba08/2025
08/2025
59.260.9%
76
Claude 3 Haiku
Claude08/2025
08/2025
56.052.2%
77
Openai Oss 20b
OpenAI08/2025
08/2025
53.052.2%
78
Nova Micro V1
Amazon08/2025
08/2025
22.617.4%
79
Gemma 3 4B IT
Google08/2025
08/2025
15.713.0%
80
Command A
Cohere08/2025
08/2025
7.20.0%

Top Performers

Calendar Challenge Champions
#1
OpenAI
92.1

Openai Oss 120b

Success Rate
95.7%
22
Tests Passed
Q
60
Quality
20
Issues
23 total tests
#2
Claude
91.3

Claude 4 Opus

Success Rate
95.7%
22
Tests Passed
Q
52
Quality
24
Issues
23 total tests
#3
Claude
90.9

Claude 4.1 Opus

Success Rate
95.7%
22
Tests Passed
Q
48
Quality
26
Issues
23 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.