Benchmark Detail View
Parking Garage Management
HARD Challenge69 models testedTop Score: 67.1
Success Rate
40.5%
Quality Score
41
Tests Passed
16
Models Tested
69
Parking Garage Benchmark - Individual Model Results
Showing 69 of 69 models
1 Claude 3.7 Sonnet (Thinking) Claude • 08/2025 08/2025 | 67.1 | 69.2% |
2 Grok 3 xAI • 08/2025 08/2025 | 65.9 | 69.2% |
3 Claude 4.5 Opus Claude • 11/2025 11/2025 | 64.3 | 69.2% |
4 Claude 4 Opus Claude • 08/2025 08/2025 | 64.1 | 69.2% |
5 DeepSeek V3 DeepSeek • 08/2025 08/2025 | 64.1 | 64.1% |
6 Claude 3.5 Sonnet Claude • 08/2025 08/2025 | 63.4 | 61.5% |
7 Claude 4 Sonnet Claude • 08/2025 08/2025 | 63.2 | 66.7% |
8 Claude 4.1 Opus Claude • 08/2025 08/2025 | 62.9 | 69.2% |
9 Claude 4.5 Sonnet Claude • 10/2025 10/2025 | 62.8 | 66.7% |
10 Codestral 25.08 Mistral • 08/2025 08/2025 | 60.6 | 61.5% |
11 Glm 4 6 Other • 10/2025 10/2025 | 60.0 | 61.5% |
12 DeepSeek V3 DeepSeek • 10/2025 10/2025 | 58.1 | 59.0% |
13 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 55.5 | 53.8% |
14 OpenAI GPT-5 Chat OpenAI • 08/2025 08/2025 | 51.5 | 51.3% |
15 Qwen3 Max Alibaba • 10/2025 10/2025 | 50.3 | 53.8% |
16 Claude 3.7 Sonnet Claude • 08/2025 08/2025 | 49.1 | 51.3% |
17 OpenAI GPT-5.1 Chat OpenAI • 11/2025 11/2025 | 48.6 | 48.7% |
18 Kimi K2 Moonshot • 10/2025 10/2025 | 47.6 | 48.7% |
19 Horizon Beta Other • 08/2025 08/2025 | 47.3 | 48.7% |
20 OpenAI GPT-4.1 mini OpenAI • 08/2025 08/2025 | 46.7 | 46.2% |
21 OpenAI GPT-5 OpenAI • 09/2025 09/2025 | 46.6 | 48.7% |
22 OpenAI GPT-4.1 OpenAI • 08/2025 08/2025 | 46.3 | 48.7% |
23 OpenAI GPT-5 OpenAI • 08/2025 08/2025 | 45.5 | 48.7% |
24 R1 DeepSeek • 08/2025 08/2025 | 45.4 | 43.6% |
25 Llama 4 Scout Meta • 08/2025 08/2025 | 45.0 | 43.6% |
26 Qwen 3 Coder Alibaba • 10/2025 10/2025 | 44.9 | 46.2% |
27 OpenAI 5.1 Codex OpenAI • 11/2025 11/2025 | 44.7 | 41.0% |
28 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 44.4 | 43.6% |
29 Kimi K2 Moonshot • 08/2025 08/2025 | 44.2 | 43.6% |
30 Claude 4.5 Haiku Claude • 10/2025 10/2025 | 44.2 | 43.6% |
31 Llama 4 Maverick Meta • 08/2025 08/2025 | 42.5 | 41.0% |
32 OpenAI GPT-5.1 OpenAI • 11/2025 11/2025 | 42.2 | 43.6% |
33 Mistral Medium 3 Mistral • 08/2025 08/2025 | 41.6 | 38.5% |
34 OpenAI o3-mini OpenAI • 08/2025 08/2025 | 40.7 | 41.0% |
35 OpenAI GPT-4 Turbo OpenAI • 08/2025 08/2025 | 40.4 | 38.5% |
36 Qwen 3 Coder Alibaba • 08/2025 08/2025 | 40.3 | 41.0% |
37 OpenAI GPT-4 OpenAI • 08/2025 08/2025 | 39.8 | 38.5% |
38 Gemini 2.0 Flash-001 Google • 08/2025 08/2025 | 39.6 | 43.6% |
39 Sonoma Sky Alpha Other • 09/2025 09/2025 | 39.3 | 41.0% |
40 Grok 4 xAI • 08/2025 08/2025 | 39.2 | 38.5% |
41 OpenAI 5 Codex OpenAI • 10/2025 10/2025 | 38.9 | 41.0% |
42 Grok 4 xAI • 10/2025 10/2025 | 38.3 | 41.0% |
43 OpenAI o1-mini OpenAI • 08/2025 08/2025 | 37.1 | 35.9% |
44 OpenAI GPT-3.5 Turbo OpenAI • 08/2025 08/2025 | 36.4 | 33.3% |
45 Claude 3 Haiku Claude • 08/2025 08/2025 | 36.0 | 33.3% |
46 Qwen3 14b Alibaba • 08/2025 08/2025 | 35.0 | 33.3% |
47 OpenAI o4-mini OpenAI • 08/2025 08/2025 | 32.9 | 30.8% |
48 OpenAI 5.1 Codex Mini OpenAI • 11/2025 11/2025 | 32.8 | 28.2% |
49 Grok Code Fast 1 xAI • 09/2025 09/2025 | 32.3 | 30.8% |
50 OpenAI o3-mini (High) OpenAI • 08/2025 08/2025 | 31.8 | 33.3% |
51 Openai Oss 20b OpenAI • 08/2025 08/2025 | 31.6 | 28.2% |
52 OpenAI GPT-5 mini OpenAI • 09/2025 09/2025 | 31.2 | 33.3% |
53 Gemini 3 Pro Preview Google • 11/2025 11/2025 | 30.1 | 30.8% |
54 Gemini 2.5 Flash Google • 08/2025 08/2025 | 29.7 | 30.8% |
55 OpenAI GPT-5 nano OpenAI • 08/2025 08/2025 | 29.0 | 28.2% |
56 OpenAI GPT-5 mini OpenAI • 08/2025 08/2025 | 28.9 | 30.8% |
57 OpenAI o4-mini (High) OpenAI • 08/2025 08/2025 | 27.9 | 25.6% |
58 Nova Micro V1 Amazon • 08/2025 08/2025 | 27.0 | 23.1% |
59 Nova Lite V1 Amazon • 08/2025 08/2025 | 23.8 | 17.9% |
60 Gemini 2.5 Pro Google • 08/2025 08/2025 | 23.5 | 20.5% |
61 Grok 3 Mini xAI • 08/2025 08/2025 | 23.4 | 23.1% |
62 OpenAI GPT-5 nano OpenAI • 09/2025 09/2025 | 22.7 | 20.5% |
63 Openai Oss 120b OpenAI • 08/2025 08/2025 | 20.4 | 17.9% |
64 Gemini 2.5 Flash Lite Google • 08/2025 08/2025 | 19.1 | 20.5% |
65 OpenAI GPT-4o mini OpenAI • 08/2025 08/2025 | 19.0 | 15.4% |
66 Claude 3.5 Haiku Claude • 08/2025 08/2025 | 16.9 | 12.8% |
67 Nova Pro V1 Amazon • 08/2025 08/2025 | 14.6 | 10.3% |
68 OpenAI GPT-4.1 nano OpenAI • 08/2025 08/2025 | 11.9 | 12.8% |
69 Coder Large Other • 08/2025 08/2025 | 9.3 | 7.7% |
Top Performers
#1
Claude67.1
Claude 3.7 Sonnet (Thinking)
Success Rate
69.2%27
Tests Passed
Q
48
Quality
26
Issues
39 total tests
#2
xAI65.9
Grok 3
Success Rate
69.2%27
Tests Passed
Q
36
Quality
32
Issues
39 total tests
#3
Claude64.3
Claude 4.5 Opus
Success Rate
69.2%27
Tests Passed
Q
20
Quality
40
Issues
39 total tests
Explore More Benchmarks
See how models perform across different programming challenges and complexity levels.