Benchmark Detail

Calendar System

EASY148 models testedTop 92.1
Success Rate
77.5%
Quality Score
60
Avg Tests Passed
18
Models Tested
148
148 models
150 of 148
Sel#ModelScoreSuccess
1
OSS 120BOpenAI
Config
92.195.7%
2
Gemma 4 26B A4BGoogle
Confignone
91.995.7%
3
Claude Opus 4Anthropic
Config
91.395.7%
4
Claude 4.5 HaikuAnthropic
Config
90.995.7%
5
Claude Opus 4.1Anthropic
Config
90.995.7%
6
Gemini 2.0 Flash 001Google
Config
88.691.3%
7
GPT 5 nanoOpenAI
Config
87.291.3%
8
GPT 4 TurboOpenAI
Config
87.091.3%
9
GPT 5.1 Codex MiniOpenAI
Config
86.787.0%
10
Kimi K2.6Moonshotai
Configlow
86.187.0%
11
GPT 4.1OpenAI
Config
85.691.3%
12
Claude Sonnet 4.6Anthropic
Config
85.387.0%
13
Codestral 25.08Mistral
Config
85.387.0%
14
Claude Sonnet 4.6Anthropic
Confighigh
84.987.0%
15
Gemini 2.5 FlashGoogle
Config
84.987.0%
16
GLM 5Z.AI
Config
84.987.0%
17
Claude Opus 4.6Anthropic
Configmedium
84.787.0%
18
Grok 4xAI
Config
84.787.0%
19
Claude 3.5 HaikuAnthropic
Config
84.587.0%
20
Claude Opus 4.5Anthropic
Config
84.587.0%
21
Claude Opus 4.6Anthropic
Confighigh
84.587.0%
22
Claude Opus 4.7Anthropic
Configmedium
84.587.0%
23
Gemini 2.5 ProGoogle
Config
84.587.0%
24
GPT 4.1 nanoOpenAI
Config
84.587.0%
25
o4 miniOpenAI
Config
84.587.0%
26
Claude Opus 4.6 FastAnthropic
Config
84.387.0%
27
Claude Opus 4.6Anthropic
Config
84.387.0%
28
Claude Opus 4.7Anthropic
Confighigh
84.387.0%
29
DeepSeek V4 ProDeepSeek
Configgen
84.387.0%
30
Gemini 3.1 Flash Lite PreviewGoogle
Configmedium
84.387.0%
31
Kimi K2 ThinkingMoonshot AI
Config
84.387.0%
32
Gemma 4 26B A4BGoogle
Configlow
84.187.0%
33
Claude Opus 4.7Anthropic
Config
83.987.0%
34
Gemini 3.1 Flash Lite PreviewGoogle
Configlow
83.987.0%
35
GLM-5.1Z Ai
Confighigh
83.987.0%
36
Llama 4 MaverickMeta
Config
83.987.0%
37
o1 miniOpenAI
Config
83.787.0%
38
GPT 5.3 CodexOpenAI
Config
83.282.6%
39
GPT 5.4OpenAI
Confighigh
83.282.6%
40
GPT 5.4
Configlow
83.082.6%
41
GPT 5.4
Config
83.082.6%
42
MiniMax M2.5Minimax
Config
82.987.0%
43
GPT 5.2OpenAI
Config
82.882.6%
44
GPT 5 CodexOpenAI
Config
82.882.6%
45
GPT 5 nanoOpenAI
Config
82.587.0%
46
GPT 5.1 CodexOpenAI
Config
82.082.6%
47
GPT 5.2OpenAI
Config
82.082.6%
48
GLM 4.7Z.AI
Config
81.987.0%
49
GPT 4OpenAI
Config
81.582.6%
50
Qwen3.6 27BQwen
Configlow
81.587.0%
150 of 148