Benchmark Detail View
Vending Machine System
MEDIUM Challenge45 models testedTop Score: 78.5
Success Rate
60.2%
Quality Score
80
Tests Passed
19
Models Tested
45
Vending Machine Benchmark - Individual Model Results
Showing 45 of 45 models
1 Gemini 2.5 Pro Google • 08/2025 08/2025 | 78.5 | 77.4% |
2 OpenAI GPT-4.1 nano OpenAI • 08/2025 08/2025 | 78.1 | 77.4% |
3 Claude 3.5 Haiku Claude • 08/2025 08/2025 | 77.9 | 77.4% |
4 OpenAI GPT-4 OpenAI • 08/2025 08/2025 | 77.9 | 77.4% |
5 R1 DeepSeek • 08/2025 08/2025 | 77.9 | 77.4% |
6 Llama 4 Maverick Meta • 08/2025 08/2025 | 77.7 | 77.4% |
7 Llama 4 Scout Meta • 08/2025 08/2025 | 77.7 | 77.4% |
8 OpenAI o3-mini (High) OpenAI • 08/2025 08/2025 | 77.7 | 77.4% |
9 Gemini 2.5 Flash Google • 08/2025 08/2025 | 77.7 | 77.4% |
10 Claude 3.7 Sonnet Claude • 08/2025 08/2025 | 77.5 | 77.4% |
11 OpenAI o3-mini OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
12 OpenAI GPT-4 Turbo OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
13 OpenAI o4-mini (High) OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
14 Grok 3 xAI • 08/2025 08/2025 | 77.3 | 77.4% |
15 OpenAI o4-mini OpenAI • 08/2025 08/2025 | 77.3 | 77.4% |
16 Claude 3.7 Sonnet (Thinking) Claude • 08/2025 08/2025 | 77.3 | 77.4% |
17 Horizon Beta Other • 08/2025 08/2025 | 77.3 | 77.4% |
18 Claude 3.5 Sonnet Claude • 08/2025 08/2025 | 77.1 | 77.4% |
19 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 77.1 | 77.4% |
20 OpenAI GPT-4.1 OpenAI • 08/2025 08/2025 | 76.9 | 77.4% |
21 OpenAI o1-mini OpenAI • 08/2025 08/2025 | 76.7 | 77.4% |
22 OpenAI GPT-4.1 mini OpenAI • 08/2025 08/2025 | 76.7 | 77.4% |
23 Codestral 25.08 Mistral • 08/2025 08/2025 | 75.4 | 74.2% |
24 Claude 4 Opus Claude • 08/2025 08/2025 | 75.0 | 74.2% |
25 Claude 4 Sonnet Claude • 08/2025 08/2025 | 75.0 | 74.2% |
26 Gemini 2.5 Flash Lite Google • 08/2025 08/2025 | 74.8 | 74.2% |
27 Nova Pro V1 Amazon • 08/2025 08/2025 | 46.0 | 41.9% |
28 Mistral Medium 3 Mistral • 08/2025 08/2025 | 46.0 | 41.9% |
29 Qwen 3 Coder Alibaba • 08/2025 08/2025 | 46.0 | 41.9% |
30 Coder Large Other • 08/2025 08/2025 | 45.8 | 41.9% |
31 Grok 4 xAI • 08/2025 08/2025 | 45.5 | 41.9% |
32 Kimi K2 Moonshot • 08/2025 08/2025 | 45.5 | 41.9% |
33 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 45.4 | 41.9% |
34 Grok 3 Mini xAI • 08/2025 08/2025 | 45.0 | 41.9% |
35 Gemini 2.0 Flash-001 Google • 08/2025 08/2025 | 45.0 | 41.9% |
36 Gemma 3 4B IT Google • 08/2025 08/2025 | 43.6 | 38.7% |
37 OpenAI GPT-3.5 Turbo OpenAI • 08/2025 08/2025 | 43.6 | 38.7% |
38 OpenAI GPT-4o mini OpenAI • 08/2025 08/2025 | 43.6 | 38.7% |
39 Claude 3 Haiku Claude • 08/2025 08/2025 | 43.6 | 38.7% |
40 Magnum V4 72B NousResearch • 08/2025 08/2025 | 43.6 | 38.7% |
41 Qwen3 14b Alibaba • 08/2025 08/2025 | 43.4 | 38.7% |
42 DeepSeek V3 DeepSeek • 08/2025 08/2025 | 43.2 | 38.7% |
43 Nova Micro V1 Amazon • 08/2025 08/2025 | 37.4 | 32.3% |
44 Nova Lite V1 Amazon • 08/2025 08/2025 | 32.2 | 25.8% |
45 Command A Cohere • 08/2025 08/2025 | 11.7 | 3.2% |
Top Performers
#1
Google78.5
Gemini 2.5 Pro
Success Rate
77.4%24
Tests Passed
Q
88
Quality
6
Issues
31 total tests
#2
OpenAI78.1
OpenAI GPT-4.1 nano
Success Rate
77.4%24
Tests Passed
Q
84
Quality
8
Issues
31 total tests
#3
Claude77.9
Claude 3.5 Haiku
Success Rate
77.4%24
Tests Passed
Q
82
Quality
9
Issues
31 total tests
Explore More Benchmarks
See how models perform across different programming challenges and complexity levels.