Benchmark Detail View

Parking Garage Management

HARD Challenge94 models testedTop Score: 67.1
Success Rate
43.2%
Quality Score
41
Tests Passed
17
Models Tested
94
Parking Garage Benchmark - Individual Model Results
Showing 94 of 94 models
1
Claude 3.7 Sonnet (Thinking)
Anthropic08/2025
08/2025
67.169.2%
2
Claude 4.6 Sonnet
Anthropic02/2026
02/2026
66.369.2%
3
DeepSeek V3.2 Exp
DeepSeek12/2025
12/2025
66.369.2%
4
Claude 4.6 Opus
Anthropic02/2026
02/2026
65.969.2%
5
Grok 3
xAI08/2025
08/2025
65.969.2%
6
Claude 4.5 Opus
Anthropic11/2025
11/2025
64.369.2%
7
Claude 4 Opus
Anthropic08/2025
08/2025
64.169.2%
8
DeepSeek V3
DeepSeek08/2025
08/2025
64.164.1%
9
Claude 3.5 Sonnet
Anthropic08/2025
08/2025
63.461.5%
10
Claude 4 Sonnet
Anthropic08/2025
08/2025
63.266.7%
11
Claude 4.1 Opus
Anthropic08/2025
08/2025
62.969.2%
12
Claude 4.5 Sonnet
Anthropic10/2025
10/2025
62.866.7%
13
GLM 5
Z.AI02/2026
02/2026
62.369.2%
14
Codestral 25.08
Mistral08/2025
08/2025
60.661.5%
15
GLM 4.6
Z.AI10/2025
10/2025
60.061.5%
16
Qwen3 Coder Next
Qwen02/2026
02/2026
59.164.1%
17
DeepSeek V3.2 Exp
DeepSeek10/2025
10/2025
58.159.0%
18
GPT 4o
OpenAI08/2025
08/2025
55.553.8%
19
GPT 5.4
OpenAI03/2026
03/2026
54.151.3%
20
GPT 5.3 Codex
OpenAI02/2026
02/2026
53.451.3%
21
Devstral 25.12
Mistral12/2025
12/2025
53.151.3%
22
GPT 5.2
OpenAI12/2025
12/2025
51.851.3%
23
GPT 5
OpenAI08/2025
08/2025
51.551.3%
24
Gemini 3 Flash Preview
Google12/2025
12/2025
50.451.3%
25
Qwen3 Max
Qwen10/2025
10/2025
50.353.8%
26
GLM 4.7
Z.AI12/2025
12/2025
50.151.3%
27
Gemini 3.1 Pro Preview
Google02/2026
02/2026
50.051.3%
28
Claude 3.7 Sonnet
Anthropic08/2025
08/2025
49.151.3%
29
GPT 5.1
OpenAI11/2025
11/2025
48.648.7%
30
GPT 5.3 Chat
OpenAI03/2026
03/2026
48.048.7%
31
Kimi K2.5
Moonshot AI02/2026
02/2026
48.051.3%
32
Grok 4.1 Fast
xAI02/2026
02/2026
47.948.7%
33
Kimi K2 (0905)
Moonshot AI10/2025
10/2025
47.648.7%
34
Horizon Beta
Other08/2025
08/2025
47.348.7%
35
GPT 4.1 mini
OpenAI08/2025
08/2025
46.746.2%
36
GPT 5
OpenAI09/2025
09/2025
46.648.7%
37
GPT 5.2 Codex
OpenAI01/2026
01/2026
46.551.3%
38
GPT 4.1
OpenAI08/2025
08/2025
46.348.7%
39
GPT 5
OpenAI08/2025
08/2025
45.548.7%
40
DeepSeek R1
DeepSeek08/2025
08/2025
45.443.6%
41
Llama 4 Scout
Meta08/2025
08/2025
45.043.6%
42
Qwen3 Coder Plus
Qwen10/2025
10/2025
44.946.2%
43
Mistral Large 25.12
Mistral12/2025
12/2025
44.843.6%
44
GPT 5.1 Codex
OpenAI11/2025
11/2025
44.741.0%
45
GPT 4o
OpenAI08/2025
08/2025
44.443.6%
46
Claude 4.5 Haiku
Anthropic10/2025
10/2025
44.243.6%
47
Kimi K2
Moonshot AI08/2025
08/2025
44.243.6%
48
MIMO V2 Flash
Minimax12/2025
12/2025
43.546.2%
49
Kimi K2 Thinking
Moonshot AI12/2025
12/2025
42.541.0%
50
Llama 4 Maverick
Meta08/2025
08/2025
42.541.0%
51
GPT 5.2
OpenAI12/2025
12/2025
42.541.0%
52
GPT 5.1
OpenAI11/2025
11/2025
42.243.6%
53
Mistral Medium 3
Mistral08/2025
08/2025
41.638.5%
54
o3 mini
OpenAI08/2025
08/2025
40.741.0%
55
MiniMax M2.5
Minimax02/2026
02/2026
40.541.0%
56
GPT 4 Turbo
OpenAI08/2025
08/2025
40.438.5%
57
Qwen3 Coder
Qwen08/2025
08/2025
40.341.0%
58
GPT 4
OpenAI08/2025
08/2025
39.838.5%
59
Gemini 2.0 Flash 001
Google08/2025
08/2025
39.643.6%
60
DeepSeek V3.2 Speciale
DeepSeek02/2026
02/2026
39.341.0%
61
Sonoma Sky Alpha
Other09/2025
09/2025
39.341.0%
62
Grok 4
xAI08/2025
08/2025
39.238.5%
63
GPT 5 Codex
OpenAI10/2025
10/2025
38.941.0%
64
Trinity Large Preview
Arcee AI02/2026
02/2026
38.638.5%
65
Grok 4 Fast
xAI10/2025
10/2025
38.341.0%
66
Nova 2 Lite V1
Amazon02/2026
02/2026
38.038.5%
67
o1 mini
OpenAI08/2025
08/2025
37.135.9%
68
GPT 3.5 Turbo
OpenAI08/2025
08/2025
36.433.3%
69
MiniMax M2.1
Minimax12/2025
12/2025
36.135.9%
70
Claude 3 Haiku
Anthropic08/2025
08/2025
36.033.3%
71
Qwen3 14B
Qwen08/2025
08/2025
35.033.3%
72
o4 mini
OpenAI08/2025
08/2025
32.930.8%
73
GPT 5.1 Codex Mini
OpenAI11/2025
11/2025
32.828.2%
74
Grok Code Fast 1
xAI09/2025
09/2025
32.330.8%
75
o3 mini (High)
OpenAI08/2025
08/2025
31.833.3%
76
OSS 20B
OpenAI08/2025
08/2025
31.628.2%
77
GPT 5 mini
OpenAI09/2025
09/2025
31.233.3%
78
Gemini 3 Pro Preview
Google11/2025
11/2025
30.130.8%
79
Gemini 2.5 Flash
Google08/2025
08/2025
29.730.8%
80
GPT 5 nano
OpenAI08/2025
08/2025
29.028.2%
81
GPT 5 mini
OpenAI08/2025
08/2025
28.930.8%
82
o4 mini (High)
OpenAI08/2025
08/2025
27.925.6%
83
Nova Micro V1
Amazon08/2025
08/2025
27.023.1%
84
Nova Lite V1
Amazon08/2025
08/2025
23.817.9%
85
Gemini 2.5 Pro
Google08/2025
08/2025
23.520.5%
86
Grok 3 Mini
xAI08/2025
08/2025
23.423.1%
87
GPT 5 nano
OpenAI09/2025
09/2025
22.720.5%
88
OSS 120B
OpenAI08/2025
08/2025
20.417.9%
89
Gemini 2.5 Flash Lite
Google08/2025
08/2025
19.120.5%
90
GPT 4o mini
OpenAI08/2025
08/2025
19.015.4%
91
Claude 3.5 Haiku
Anthropic08/2025
08/2025
16.912.8%
92
Nova Pro V1
Amazon08/2025
08/2025
14.610.3%
93
GPT 4.1 nano
OpenAI08/2025
08/2025
11.912.8%
94
Coder Large
Other08/2025
08/2025
9.37.7%

Top Performers

Parking Garage Champions
#1
Anthropic
67.1

Claude 3.7 Sonnet (Thinking)

Success Rate
69.2%
27
Tests Passed
Q
48
Quality
26
Issues
39 total tests
#2
Anthropic
66.3

Claude 4.6 Sonnet

Success Rate
69.2%
27
Tests Passed
Q
40
Quality
30
Issues
39 total tests
#3
DeepSeek
66.3

DeepSeek V3.2 Exp

Success Rate
69.2%
27
Tests Passed
Q
40
Quality
30
Issues
39 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.