Benchmark Detail View

Parking Garage Management

HARD Challenge81 models testedTop Score: 67.1
Success Rate
42.0%
Quality Score
41
Tests Passed
16
Models Tested
81
Parking Garage Benchmark - Individual Model Results
Showing 81 of 81 models
1
Claude 3.7 Sonnet (Thinking)
Claude08/2025
08/2025
67.169.2%
2
DeepSeek V3
DeepSeek12/2025
12/2025
66.369.2%
3
Grok 3
xAI08/2025
08/2025
65.969.2%
4
Claude 4.6 Opus
Claude02/2026
02/2026
65.969.2%
5
Claude 4.5 Opus
Claude11/2025
11/2025
64.369.2%
6
Claude 4 Opus
Claude08/2025
08/2025
64.169.2%
7
DeepSeek V3
DeepSeek08/2025
08/2025
64.164.1%
8
Claude 3.5 Sonnet
Claude08/2025
08/2025
63.461.5%
9
Claude 4 Sonnet
Claude08/2025
08/2025
63.266.7%
10
Claude 4.1 Opus
Claude08/2025
08/2025
62.969.2%
11
Claude 4.5 Sonnet
Claude10/2025
10/2025
62.866.7%
12
Codestral 25.08
Mistral08/2025
08/2025
60.661.5%
13
Glm 4 6
Other10/2025
10/2025
60.061.5%
14
DeepSeek V3
DeepSeek10/2025
10/2025
58.159.0%
15
OpenAI GPT-4o
OpenAI08/2025
08/2025
55.553.8%
16
Devstral 2512
Other12/2025
12/2025
53.151.3%
17
OpenAI GPT-5.2 Chat
OpenAI12/2025
12/2025
51.851.3%
18
OpenAI GPT-5 Chat
OpenAI08/2025
08/2025
51.551.3%
19
Gemini 3 Flash
Google12/2025
12/2025
50.451.3%
20
Qwen3 Max
Alibaba10/2025
10/2025
50.353.8%
21
Glm 4 7
Other12/2025
12/2025
50.151.3%
22
Claude 3.7 Sonnet
Claude08/2025
08/2025
49.151.3%
23
OpenAI GPT-5.1 Chat
OpenAI11/2025
11/2025
48.648.7%
24
Kimi K2
Moonshot10/2025
10/2025
47.648.7%
25
Horizon Beta
Other08/2025
08/2025
47.348.7%
26
OpenAI GPT-4.1 mini
OpenAI08/2025
08/2025
46.746.2%
27
OpenAI GPT-5
OpenAI09/2025
09/2025
46.648.7%
28
OpenAI 5.2 Codex
OpenAI01/2026
01/2026
46.551.3%
29
OpenAI GPT-4.1
OpenAI08/2025
08/2025
46.348.7%
30
OpenAI GPT-5
OpenAI08/2025
08/2025
45.548.7%
31
R1
DeepSeek08/2025
08/2025
45.443.6%
32
Llama 4 Scout
Meta08/2025
08/2025
45.043.6%
33
Qwen 3 Coder
Alibaba10/2025
10/2025
44.946.2%
34
Mistral Large 2512
Mistral12/2025
12/2025
44.843.6%
35
OpenAI 5.1 Codex
OpenAI11/2025
11/2025
44.741.0%
36
OpenAI GPT-4o
OpenAI08/2025
08/2025
44.443.6%
37
Claude 4.5 Haiku
Claude10/2025
10/2025
44.243.6%
38
Kimi K2
Moonshot08/2025
08/2025
44.243.6%
39
Mimo V2 Flash Free
Other12/2025
12/2025
43.546.2%
40
OpenAI GPT-5.2
OpenAI12/2025
12/2025
42.541.0%
41
Kimi K2
Moonshot12/2025
12/2025
42.541.0%
42
Llama 4 Maverick
Meta08/2025
08/2025
42.541.0%
43
OpenAI GPT-5.1
OpenAI11/2025
11/2025
42.243.6%
44
Mistral Medium 3
Mistral08/2025
08/2025
41.638.5%
45
OpenAI o3-mini
OpenAI08/2025
08/2025
40.741.0%
46
OpenAI GPT-4 Turbo
OpenAI08/2025
08/2025
40.438.5%
47
Qwen 3 Coder
Alibaba08/2025
08/2025
40.341.0%
48
OpenAI GPT-4
OpenAI08/2025
08/2025
39.838.5%
49
Gemini 2.0 Flash-001
Google08/2025
08/2025
39.643.6%
50
Sonoma Sky Alpha
Other09/2025
09/2025
39.341.0%
51
Grok 4
xAI08/2025
08/2025
39.238.5%
52
OpenAI 5 Codex
OpenAI10/2025
10/2025
38.941.0%
53
Grok 4
xAI10/2025
10/2025
38.341.0%
54
OpenAI o1-mini
OpenAI08/2025
08/2025
37.135.9%
55
OpenAI GPT-3.5 Turbo
OpenAI08/2025
08/2025
36.433.3%
56
Minimax M2 1
Other12/2025
12/2025
36.135.9%
57
Claude 3 Haiku
Claude08/2025
08/2025
36.033.3%
58
Qwen3 14b
Alibaba08/2025
08/2025
35.033.3%
59
OpenAI o4-mini
OpenAI08/2025
08/2025
32.930.8%
60
OpenAI 5.1 Codex Mini
OpenAI11/2025
11/2025
32.828.2%
61
Grok Code Fast 1
xAI09/2025
09/2025
32.330.8%
62
OpenAI o3-mini (High)
OpenAI08/2025
08/2025
31.833.3%
63
Openai Oss 20b
OpenAI08/2025
08/2025
31.628.2%
64
OpenAI GPT-5 mini
OpenAI09/2025
09/2025
31.233.3%
65
Gemini 3 Pro Preview
Google11/2025
11/2025
30.130.8%
66
Gemini 2.5 Flash
Google08/2025
08/2025
29.730.8%
67
OpenAI GPT-5 nano
OpenAI08/2025
08/2025
29.028.2%
68
OpenAI GPT-5 mini
OpenAI08/2025
08/2025
28.930.8%
69
OpenAI o4-mini (High)
OpenAI08/2025
08/2025
27.925.6%
70
Nova Micro V1
Amazon08/2025
08/2025
27.023.1%
71
Nova Lite V1
Amazon08/2025
08/2025
23.817.9%
72
Gemini 2.5 Pro
Google08/2025
08/2025
23.520.5%
73
Grok 3 Mini
xAI08/2025
08/2025
23.423.1%
74
OpenAI GPT-5 nano
OpenAI09/2025
09/2025
22.720.5%
75
Openai Oss 120b
OpenAI08/2025
08/2025
20.417.9%
76
Gemini 2.5 Flash Lite
Google08/2025
08/2025
19.120.5%
77
OpenAI GPT-4o mini
OpenAI08/2025
08/2025
19.015.4%
78
Claude 3.5 Haiku
Claude08/2025
08/2025
16.912.8%
79
Nova Pro V1
Amazon08/2025
08/2025
14.610.3%
80
OpenAI GPT-4.1 nano
OpenAI08/2025
08/2025
11.912.8%
81
Coder Large
Other08/2025
08/2025
9.37.7%

Top Performers

Parking Garage Champions
#1
Claude
67.1

Claude 3.7 Sonnet (Thinking)

Success Rate
69.2%
27
Tests Passed
Q
48
Quality
26
Issues
39 total tests
#2
DeepSeek
66.3

DeepSeek V3

Success Rate
69.2%
27
Tests Passed
Q
40
Quality
30
Issues
39 total tests
#3
xAI
65.9

Grok 3

Success Rate
69.2%
27
Tests Passed
Q
36
Quality
32
Issues
39 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.