Benchmark Detail View
Calendar System
EASY Challenge79 models testedTop Score: 92.1
Success Rate
76.4%
Quality Score
59
Tests Passed
18
Models Tested
79
Calendar System Benchmark - Individual Model Results
Showing 79 of 79 models
1 Openai Oss 120b OpenAI • 08/2025 08/2025 | 92.1 | 95.7% |
2 Claude 4 Opus Claude • 08/2025 08/2025 | 91.3 | 95.7% |
3 Claude 4.1 Opus Claude • 08/2025 08/2025 | 90.9 | 95.7% |
4 Claude 4.5 Haiku Claude • 10/2025 10/2025 | 90.9 | 95.7% |
5 Gemini 2.0 Flash-001 Google • 08/2025 08/2025 | 88.6 | 91.3% |
6 OpenAI GPT-5 nano OpenAI • 08/2025 08/2025 | 87.2 | 91.3% |
7 OpenAI GPT-4 Turbo OpenAI • 08/2025 08/2025 | 87.0 | 91.3% |
8 OpenAI 5.1 Codex Mini OpenAI • 11/2025 11/2025 | 86.7 | 87.0% |
9 OpenAI GPT-4.1 OpenAI • 08/2025 08/2025 | 85.6 | 91.3% |
10 Codestral 25.08 Mistral • 08/2025 08/2025 | 85.3 | 87.0% |
11 Gemini 2.5 Flash Google • 08/2025 08/2025 | 84.9 | 87.0% |
12 Grok 4 xAI • 08/2025 08/2025 | 84.7 | 87.0% |
13 OpenAI GPT-4.1 nano OpenAI • 08/2025 08/2025 | 84.5 | 87.0% |
14 OpenAI o4-mini OpenAI • 08/2025 08/2025 | 84.5 | 87.0% |
15 Claude 4.5 Opus Claude • 11/2025 11/2025 | 84.5 | 87.0% |
16 Gemini 2.5 Pro Google • 08/2025 08/2025 | 84.5 | 87.0% |
17 Claude 3.5 Haiku Claude • 08/2025 08/2025 | 84.5 | 87.0% |
18 Kimi K2 Moonshot • 12/2025 12/2025 | 84.3 | 87.0% |
19 Llama 4 Maverick Meta • 08/2025 08/2025 | 83.9 | 87.0% |
20 OpenAI o1-mini OpenAI • 08/2025 08/2025 | 83.7 | 87.0% |
21 OpenAI GPT-5.2 OpenAI • 12/2025 12/2025 | 82.8 | 82.6% |
22 OpenAI 5 Codex OpenAI • 10/2025 10/2025 | 82.8 | 82.6% |
23 OpenAI GPT-5 nano OpenAI • 09/2025 09/2025 | 82.5 | 87.0% |
24 OpenAI 5.1 Codex OpenAI • 11/2025 11/2025 | 82.0 | 82.6% |
25 OpenAI GPT-5.2 Chat OpenAI • 12/2025 12/2025 | 82.0 | 82.6% |
26 Glm 4 7 Other • 12/2025 12/2025 | 81.9 | 87.0% |
27 OpenAI GPT-4 OpenAI • 08/2025 08/2025 | 81.5 | 82.6% |
28 OpenAI o4-mini (High) OpenAI • 08/2025 08/2025 | 80.8 | 82.6% |
29 Glm 4 6 Other • 10/2025 10/2025 | 80.8 | 82.6% |
30 OpenAI GPT-4.1 mini OpenAI • 08/2025 08/2025 | 80.5 | 82.6% |
31 Claude 3.5 Sonnet Claude • 08/2025 08/2025 | 80.3 | 82.6% |
32 Coder Large Other • 08/2025 08/2025 | 80.3 | 82.6% |
33 OpenAI o3-mini OpenAI • 08/2025 08/2025 | 80.2 | 82.6% |
34 Grok Code Fast 1 xAI • 09/2025 09/2025 | 80.2 | 82.6% |
35 Claude 4 Sonnet Claude • 08/2025 08/2025 | 79.5 | 82.6% |
36 OpenAI o3-mini (High) OpenAI • 08/2025 08/2025 | 79.3 | 82.6% |
37 Claude 3.7 Sonnet (Thinking) Claude • 08/2025 08/2025 | 78.8 | 82.6% |
38 OpenAI GPT-5 mini OpenAI • 08/2025 08/2025 | 78.5 | 82.6% |
39 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 78.0 | 78.3% |
40 Gemini 2.5 Flash Lite Google • 08/2025 08/2025 | 78.0 | 78.3% |
41 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 78.0 | 82.6% |
42 OpenAI GPT-5.1 OpenAI • 11/2025 11/2025 | 77.8 | 78.3% |
43 Qwen3 Max Alibaba • 10/2025 10/2025 | 77.8 | 82.6% |
44 OpenAI GPT-5.1 Chat OpenAI • 11/2025 11/2025 | 76.6 | 78.3% |
45 Horizon Beta Other • 08/2025 08/2025 | 76.6 | 78.3% |
46 DeepSeek V3 DeepSeek • 10/2025 10/2025 | 76.6 | 78.3% |
47 OpenAI GPT-5 Chat OpenAI • 08/2025 08/2025 | 76.6 | 78.3% |
48 OpenAI GPT-5 OpenAI • 09/2025 09/2025 | 76.2 | 78.3% |
49 Grok 3 Mini xAI • 08/2025 08/2025 | 75.8 | 78.3% |
50 Gemini 3 Flash Google • 12/2025 12/2025 | 75.2 | 78.3% |
51 OpenAI GPT-5 OpenAI • 08/2025 08/2025 | 74.6 | 78.3% |
52 Nova Pro V1 Amazon • 08/2025 08/2025 | 73.7 | 73.9% |
53 Llama 4 Scout Meta • 08/2025 08/2025 | 73.5 | 73.9% |
54 OpenAI GPT-4o mini OpenAI • 08/2025 08/2025 | 73.3 | 73.9% |
55 DeepSeek V3 DeepSeek • 08/2025 08/2025 | 73.1 | 73.9% |
56 DeepSeek V3 DeepSeek • 12/2025 12/2025 | 72.7 | 73.9% |
57 Claude 4.5 Sonnet Claude • 10/2025 10/2025 | 72.7 | 73.9% |
58 Gemini 3 Pro Preview Google • 11/2025 11/2025 | 72.5 | 73.9% |
59 Grok 4 xAI • 10/2025 10/2025 | 72.4 | 78.3% |
60 R1 DeepSeek • 08/2025 08/2025 | 72.1 | 73.9% |
61 Sonoma Sky Alpha Other • 09/2025 09/2025 | 71.6 | 78.3% |
62 Mistral Large 2512 Mistral • 12/2025 12/2025 | 68.6 | 69.6% |
63 OpenAI GPT-5 mini OpenAI • 09/2025 09/2025 | 66.2 | 69.6% |
64 Kimi K2 Moonshot • 10/2025 10/2025 | 65.5 | 65.2% |
65 Mimo V2 Flash Free Other • 12/2025 12/2025 | 65.4 | 69.6% |
66 Devstral 2512 Other • 12/2025 12/2025 | 65.1 | 65.2% |
67 Grok 3 xAI • 08/2025 08/2025 | 65.1 | 65.2% |
68 Kimi K2 Moonshot • 08/2025 08/2025 | 64.9 | 65.2% |
69 Claude 3.7 Sonnet Claude • 08/2025 08/2025 | 63.5 | 65.2% |
70 Qwen 3 Coder Alibaba • 10/2025 10/2025 | 63.5 | 65.2% |
71 Nova Lite V1 Amazon • 08/2025 08/2025 | 63.2 | 60.9% |
72 Mistral Medium 3 Mistral • 08/2025 08/2025 | 61.4 | 60.9% |
73 OpenAI GPT-3.5 Turbo OpenAI • 08/2025 08/2025 | 59.7 | 56.5% |
74 Qwen 3 Coder Alibaba • 08/2025 08/2025 | 59.2 | 60.9% |
75 Claude 3 Haiku Claude • 08/2025 08/2025 | 56.0 | 52.2% |
76 Openai Oss 20b OpenAI • 08/2025 08/2025 | 53.0 | 52.2% |
77 Nova Micro V1 Amazon • 08/2025 08/2025 | 22.6 | 17.4% |
78 Gemma 3 4B IT Google • 08/2025 08/2025 | 15.7 | 13.0% |
79 Command A Cohere • 08/2025 08/2025 | 7.2 | 0.0% |
Top Performers
#1
OpenAI92.1
Openai Oss 120b
Success Rate
95.7%22
Tests Passed
Q
60
Quality
20
Issues
23 total tests
#2
Claude91.3
Claude 4 Opus
Success Rate
95.7%22
Tests Passed
Q
52
Quality
24
Issues
23 total tests
#3
Claude90.9
Claude 4.1 Opus
Success Rate
95.7%22
Tests Passed
Q
48
Quality
26
Issues
23 total tests
Explore More Benchmarks
See how models perform across different programming challenges and complexity levels.