Benchmark Detail View

Parking Garage Management

HARD Challenge42 models testedTop Score: 67.1
Success Rate
38.7%
Quality Score
45
Tests Passed
15
Models Tested
42
Parking Garage Benchmark - Individual Model Results
Showing 42 of 42 models
1
Claude 3.7 Sonnet (Thinking)
Claude08/2025
08/2025
67.169.2%
2
Grok 3
xAI08/2025
08/2025
65.969.2%
3
Claude 4 Opus
Claude08/2025
08/2025
64.169.2%
4
DeepSeek V3
DeepSeek08/2025
08/2025
64.164.1%
5
Claude 3.5 Sonnet
Claude08/2025
08/2025
63.461.5%
6
Claude 4 Sonnet
Claude08/2025
08/2025
63.266.7%
7
Codestral 25.08
Mistral08/2025
08/2025
60.661.5%
8
OpenAI GPT-4o
OpenAI08/2025
08/2025
55.553.8%
9
Claude 3.7 Sonnet
Claude08/2025
08/2025
49.151.3%
10
Horizon Beta
Other08/2025
08/2025
47.348.7%
11
OpenAI GPT-4.1 mini
OpenAI08/2025
08/2025
46.746.2%
12
OpenAI GPT-4.1
OpenAI08/2025
08/2025
46.348.7%
13
R1
DeepSeek08/2025
08/2025
45.443.6%
14
Llama 4 Scout
Meta08/2025
08/2025
45.043.6%
15
OpenAI GPT-4o
OpenAI08/2025
08/2025
44.443.6%
16
Kimi K2
Moonshot08/2025
08/2025
44.243.6%
17
Llama 4 Maverick
Meta08/2025
08/2025
42.541.0%
18
Mistral Medium 3
Mistral08/2025
08/2025
41.638.5%
19
OpenAI o3-mini
OpenAI08/2025
08/2025
40.741.0%
20
OpenAI GPT-4 Turbo
OpenAI08/2025
08/2025
40.438.5%
21
Qwen 3 Coder
Alibaba08/2025
08/2025
40.341.0%
22
OpenAI GPT-4
OpenAI08/2025
08/2025
39.838.5%
23
Gemini 2.0 Flash-001
Google08/2025
08/2025
39.643.6%
24
Grok 4
xAI08/2025
08/2025
39.238.5%
25
OpenAI o1-mini
OpenAI08/2025
08/2025
37.135.9%
26
OpenAI GPT-3.5 Turbo
OpenAI08/2025
08/2025
36.433.3%
27
Claude 3 Haiku
Claude08/2025
08/2025
36.033.3%
28
Qwen3 14b
Alibaba08/2025
08/2025
35.033.3%
29
OpenAI o4-mini
OpenAI08/2025
08/2025
32.930.8%
30
OpenAI o3-mini (High)
OpenAI08/2025
08/2025
31.833.3%
31
Gemini 2.5 Flash
Google08/2025
08/2025
29.730.8%
32
OpenAI o4-mini (High)
OpenAI08/2025
08/2025
27.925.6%
33
Nova Micro V1
Amazon08/2025
08/2025
27.023.1%
34
Nova Lite V1
Amazon08/2025
08/2025
23.817.9%
35
Gemini 2.5 Pro
Google08/2025
08/2025
23.520.5%
36
Grok 3 Mini
xAI08/2025
08/2025
23.423.1%
37
Gemini 2.5 Flash Lite
Google08/2025
08/2025
19.120.5%
38
OpenAI GPT-4o mini
OpenAI08/2025
08/2025
19.015.4%
39
Claude 3.5 Haiku
Claude08/2025
08/2025
16.912.8%
40
Nova Pro V1
Amazon08/2025
08/2025
14.610.3%
41
OpenAI GPT-4.1 nano
OpenAI08/2025
08/2025
11.912.8%
42
Coder Large
Other08/2025
08/2025
9.37.7%

Top Performers

Parking Garage Champions
#1
Claude
67.1

Claude 3.7 Sonnet (Thinking)

Success Rate
69.2%
27
Tests Passed
Q
48
Quality
26
Issues
39 total tests
#2
xAI
65.9

Grok 3

Success Rate
69.2%
27
Tests Passed
Q
36
Quality
32
Issues
39 total tests
#3
Claude
64.1

Claude 4 Opus

Success Rate
69.2%
27
Tests Passed
Q
18
Quality
41
Issues
39 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.