Benchmark Detail
Calendar System
EASY148 models testedTop 92.1
Success Rate
77.5%
Quality Score
60
Avg Tests Passed
18
Models Tested
148
148 models
1–50 of 148
| Sel | # | Model | Config | Score ↓ | Success | Date | |
|---|---|---|---|---|---|---|---|
1 | OSS 120BOpenAI Config— | — | 92.1 | 95.7% | 60 | 2025-08 | |
2 | Gemma 4 26B A4BGoogle Confignone | none | 91.9 | 95.7% | 58 | 2026-04 | |
3 | Claude Opus 4Anthropic Config— | — | 91.3 | 95.7% | 52 | 2025-08 | |
4 | Claude 4.5 HaikuAnthropic Config— | — | 90.9 | 95.7% | 48 | 2025-10 | |
5 | Claude Opus 4.1Anthropic Config— | — | 90.9 | 95.7% | 48 | 2025-08 | |
6 | Gemini 2.0 Flash 001Google Config— | — | 88.6 | 91.3% | 64 | 2025-08 | |
7 | GPT 5 nanoOpenAI Config— | — | 87.2 | 91.3% | 50 | 2025-08 | |
8 | GPT 4 TurboOpenAI Config— | — | 87.0 | 91.3% | 48 | 2025-08 | |
9 | GPT 5.1 Codex MiniOpenAI Config— | — | 86.7 | 87.0% | 84 | 2025-11 | |
10 | Kimi K2.6Moonshotai Configlow | low | 86.1 | 87.0% | 78 | 2026-04 | |
11 | GPT 4.1OpenAI Config— | — | 85.6 | 91.3% | 34 | 2025-08 | |
12 | Claude Sonnet 4.6Anthropic Config— | — | 85.3 | 87.0% | 70 | 2026-02 | |
13 | Codestral 25.08Mistral Config— | — | 85.3 | 87.0% | 70 | 2025-08 | |
14 | Claude Sonnet 4.6Anthropic Confighigh· 8,192 tokens | high· 8,192 tokens | 84.9 | 87.0% | 66 | 2026-04 | |
15 | Gemini 2.5 FlashGoogle Config— | — | 84.9 | 87.0% | 66 | 2025-08 | |
16 | GLM 5Z.AI Config— | — | 84.9 | 87.0% | 66 | 2026-02 | |
17 | Claude Opus 4.6Anthropic Configmedium· 2,048 tokens | medium· 2,048 tokens | 84.7 | 87.0% | 64 | 2026-04 | |
18 | Grok 4xAI Config— | — | 84.7 | 87.0% | 64 | 2025-08 | |
19 | Claude 3.5 HaikuAnthropic Config— | — | 84.5 | 87.0% | 62 | 2025-08 | |
20 | Claude Opus 4.5Anthropic Config— | — | 84.5 | 87.0% | 62 | 2025-11 | |
21 | Claude Opus 4.6Anthropic Confighigh· 8,192 tokens | high· 8,192 tokens | 84.5 | 87.0% | 62 | 2026-04 | |
22 | Claude Opus 4.7Anthropic Configmedium· 2,048 tokens | medium· 2,048 tokens | 84.5 | 87.0% | 62 | 2026-04 | |
23 | Gemini 2.5 ProGoogle Config— | — | 84.5 | 87.0% | 62 | 2025-08 | |
24 | GPT 4.1 nanoOpenAI Config— | — | 84.5 | 87.0% | 62 | 2025-08 | |
25 | o4 miniOpenAI Config— | — | 84.5 | 87.0% | 62 | 2025-08 | |
26 | Claude Opus 4.6 FastAnthropic Config— | — | 84.3 | 87.0% | 60 | 2026-04 | |
27 | Claude Opus 4.6Anthropic Config— | — | 84.3 | 87.0% | 60 | 2026-02 | |
28 | Claude Opus 4.7Anthropic Confighigh· 8,192 tokens | high· 8,192 tokens | 84.3 | 87.0% | 60 | 2026-04 | |
29 | DeepSeek V4 ProDeepSeek Configgen | gen | 84.3 | 87.0% | 60 | 2026-05 | |
30 | Gemini 3.1 Flash Lite PreviewGoogle Configmedium | medium | 84.3 | 87.0% | 60 | 2026-04 | |
31 | Kimi K2 ThinkingMoonshot AI Config | 84.3 | 87.0% | 60 | 2025-12 | ||
32 | Gemma 4 26B A4BGoogle Configlow | low | 84.1 | 87.0% | 58 | 2026-04 | |
33 | Claude Opus 4.7Anthropic Config— | — | 83.9 | 87.0% | 56 | 2026-04 | |
34 | Gemini 3.1 Flash Lite PreviewGoogle Configlow | low | 83.9 | 87.0% | 56 | 2026-04 | |
35 | GLM-5.1Z Ai Confighigh | high | 83.9 | 87.0% | 56 | 2026-04 | |
36 | Llama 4 MaverickMeta Config— | — | 83.9 | 87.0% | 56 | 2025-08 | |
37 | o1 miniOpenAI Config— | — | 83.7 | 87.0% | 54 | 2025-08 | |
38 | GPT 5.3 CodexOpenAI Config— | — | 83.2 | 82.6% | 88 | 2026-02 | |
39 | GPT 5.4OpenAI Confighigh | high | 83.2 | 82.6% | 88 | 2026-04 | |
40 | ↳ GPT 5.4 Configlow | low | 83.0 | 82.6% | 86 | 2026-04 | |
41 | ↳ GPT 5.4 Config— | — | 83.0 | 82.6% | 86 | 2026-03 | |
42 | MiniMax M2.5Minimax Config— | — | 82.9 | 87.0% | 46 | 2026-02 | |
43 | GPT 5.2OpenAI Config— | — | 82.8 | 82.6% | 84 | 2025-12 | |
44 | GPT 5 CodexOpenAI Config— | — | 82.8 | 82.6% | 84 | 2025-10 | |
45 | GPT 5 nanoOpenAI Config— | — | 82.5 | 87.0% | 42 | 2025-09 | |
46 | GPT 5.1 CodexOpenAI Config— | — | 82.0 | 82.6% | 76 | 2025-11 | |
47 | GPT 5.2OpenAI Config— | — | 82.0 | 82.6% | 76 | 2025-12 | |
48 | GLM 4.7Z.AI Config— | — | 81.9 | 87.0% | 36 | 2025-12 | |
49 | GPT 4OpenAI Config— | — | 81.5 | 82.6% | 72 | 2025-08 | |
50 | Qwen3.6 27BQwen Configlow | low | 81.5 | 87.0% | 32 | 2026-04 |
1–50 of 148