Benchmark Detail View
Vending Machine System
MEDIUM Challenge85 models testedTop Score: 78.9
Success Rate
64.0%
Quality Score
80
Tests Passed
20
Models Tested
85
Vending Machine Benchmark - Individual Model Results
Showing 85 of 85 models
1 OpenAI GPT-5.2 OpenAI • 12/2025 12/2025 | 78.9 | 77.4% |
2 Gemini 2.5 Pro Google • 08/2025 08/2025 | 78.5 | 77.4% |
3 Openai Oss 120b OpenAI • 08/2025 08/2025 | 78.3 | 77.4% |
4 OpenAI GPT-5 nano OpenAI • 08/2025 08/2025 | 78.3 | 77.4% |
5 OpenAI 5.1 Codex OpenAI • 11/2025 11/2025 | 78.1 | 77.4% |
6 OpenAI GPT-4.1 nano OpenAI • 08/2025 08/2025 | 78.1 | 77.4% |
7 OpenAI GPT-5.2 Chat OpenAI • 12/2025 12/2025 | 78.1 | 77.4% |
8 Glm 4 6 Other • 10/2025 10/2025 | 77.9 | 77.4% |
9 DeepSeek V3 DeepSeek • 12/2025 12/2025 | 77.9 | 77.4% |
10 OpenAI GPT-4 OpenAI • 08/2025 08/2025 | 77.9 | 77.4% |
11 Claude 3.5 Haiku Claude • 08/2025 08/2025 | 77.9 | 77.4% |
12 OpenAI GPT-5.1 OpenAI • 11/2025 11/2025 | 77.9 | 77.4% |
13 R1 DeepSeek • 08/2025 08/2025 | 77.9 | 77.4% |
14 DeepSeek V3 DeepSeek • 10/2025 10/2025 | 77.7 | 77.4% |
15 OpenAI o3-mini (High) OpenAI • 08/2025 08/2025 | 77.7 | 77.4% |
16 OpenAI 5.2 Codex OpenAI • 01/2026 01/2026 | 77.7 | 77.4% |
17 Llama 4 Maverick Meta • 08/2025 08/2025 | 77.7 | 77.4% |
18 Gemini 2.5 Flash Google • 08/2025 08/2025 | 77.7 | 77.4% |
19 Qwen 3 Coder Alibaba • 10/2025 10/2025 | 77.7 | 77.4% |
20 Gemini 3 Flash Google • 12/2025 12/2025 | 77.7 | 77.4% |
21 OpenAI GPT-5 OpenAI • 09/2025 09/2025 | 77.7 | 77.4% |
22 Kimi K2 Moonshot • 12/2025 12/2025 | 77.7 | 77.4% |
23 OpenAI 5 Codex OpenAI • 10/2025 10/2025 | 77.7 | 77.4% |
24 Llama 4 Scout Meta • 08/2025 08/2025 | 77.7 | 77.4% |
25 Qwen3 Max Alibaba • 10/2025 10/2025 | 77.7 | 77.4% |
26 Claude 4.5 Opus Claude • 11/2025 11/2025 | 77.7 | 77.4% |
27 Claude 4.6 Opus Claude • 02/2026 02/2026 | 77.7 | 77.4% |
28 Claude 4.5 Sonnet Claude • 10/2025 10/2025 | 77.5 | 77.4% |
29 OpenAI GPT-5.1 Chat OpenAI • 11/2025 11/2025 | 77.5 | 77.4% |
30 OpenAI o3-mini OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
31 Claude 3.7 Sonnet Claude • 08/2025 08/2025 | 77.5 | 77.4% |
32 OpenAI GPT-5 Chat OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
33 OpenAI o4-mini (High) OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
34 OpenAI GPT-4 Turbo OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
35 Claude 3.7 Sonnet (Thinking) Claude • 08/2025 08/2025 | 77.3 | 77.4% |
36 OpenAI o4-mini OpenAI • 08/2025 08/2025 | 77.3 | 77.4% |
37 Grok 4 xAI • 10/2025 10/2025 | 77.3 | 77.4% |
38 OpenAI GPT-5 OpenAI • 08/2025 08/2025 | 77.3 | 77.4% |
39 Grok 3 xAI • 08/2025 08/2025 | 77.3 | 77.4% |
40 Kimi K2 Moonshot • 10/2025 10/2025 | 77.3 | 77.4% |
41 Horizon Beta Other • 08/2025 08/2025 | 77.3 | 77.4% |
42 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 77.1 | 77.4% |
43 OpenAI 5.1 Codex Mini OpenAI • 11/2025 11/2025 | 77.1 | 77.4% |
44 Claude 3.5 Sonnet Claude • 08/2025 08/2025 | 77.1 | 77.4% |
45 OpenAI GPT-5 nano OpenAI • 09/2025 09/2025 | 77.1 | 77.4% |
46 OpenAI GPT-4.1 OpenAI • 08/2025 08/2025 | 76.9 | 77.4% |
47 OpenAI GPT-4.1 mini OpenAI • 08/2025 08/2025 | 76.7 | 77.4% |
48 OpenAI o1-mini OpenAI • 08/2025 08/2025 | 76.7 | 77.4% |
49 OpenAI GPT-5 mini OpenAI • 09/2025 09/2025 | 76.5 | 77.4% |
50 OpenAI GPT-5 mini OpenAI • 08/2025 08/2025 | 76.3 | 77.4% |
51 Codestral 25.08 Mistral • 08/2025 08/2025 | 75.4 | 74.2% |
52 Sonoma Sky Alpha Other • 09/2025 09/2025 | 75.4 | 74.2% |
53 Claude 4 Opus Claude • 08/2025 08/2025 | 75.0 | 74.2% |
54 Claude 4.1 Opus Claude • 08/2025 08/2025 | 75.0 | 74.2% |
55 Claude 4 Sonnet Claude • 08/2025 08/2025 | 75.0 | 74.2% |
56 Gemini 2.5 Flash Lite Google • 08/2025 08/2025 | 74.8 | 74.2% |
57 Minimax M2 1 Other • 12/2025 12/2025 | 46.5 | 41.9% |
58 Devstral 2512 Other • 12/2025 12/2025 | 46.4 | 41.9% |
59 Mistral Large 2512 Mistral • 12/2025 12/2025 | 46.4 | 41.9% |
60 Openai Oss 20b OpenAI • 08/2025 08/2025 | 46.0 | 41.9% |
61 Nova Pro V1 Amazon • 08/2025 08/2025 | 46.0 | 41.9% |
62 Glm 4 7 Other • 12/2025 12/2025 | 46.0 | 41.9% |
63 Qwen 3 Coder Alibaba • 08/2025 08/2025 | 46.0 | 41.9% |
64 Gemini 3 Pro Preview Google • 11/2025 11/2025 | 46.0 | 41.9% |
65 Mistral Medium 3 Mistral • 08/2025 08/2025 | 46.0 | 41.9% |
66 Grok Code Fast 1 xAI • 09/2025 09/2025 | 45.8 | 41.9% |
67 Coder Large Other • 08/2025 08/2025 | 45.8 | 41.9% |
68 Grok 4 xAI • 08/2025 08/2025 | 45.5 | 41.9% |
69 Kimi K2 Moonshot • 08/2025 08/2025 | 45.5 | 41.9% |
70 Mimo V2 Flash Free Other • 12/2025 12/2025 | 45.5 | 41.9% |
71 Glm 4 5 Other • 08/2025 08/2025 | 45.4 | 41.9% |
72 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 45.4 | 41.9% |
73 Grok 3 Mini xAI • 08/2025 08/2025 | 45.0 | 41.9% |
74 Gemini 2.0 Flash-001 Google • 08/2025 08/2025 | 45.0 | 41.9% |
75 Gemma 3 4B IT Google • 08/2025 08/2025 | 43.6 | 38.7% |
76 Magnum V4 72B NousResearch • 08/2025 08/2025 | 43.6 | 38.7% |
77 Claude 3 Haiku Claude • 08/2025 08/2025 | 43.6 | 38.7% |
78 OpenAI GPT-3.5 Turbo OpenAI • 08/2025 08/2025 | 43.6 | 38.7% |
79 OpenAI GPT-4o mini OpenAI • 08/2025 08/2025 | 43.6 | 38.7% |
80 Qwen3 14b Alibaba • 08/2025 08/2025 | 43.4 | 38.7% |
81 DeepSeek V3 DeepSeek • 08/2025 08/2025 | 43.2 | 38.7% |
82 Claude 4.5 Haiku Claude • 10/2025 10/2025 | 43.0 | 38.7% |
83 Nova Micro V1 Amazon • 08/2025 08/2025 | 37.4 | 32.3% |
84 Nova Lite V1 Amazon • 08/2025 08/2025 | 32.2 | 25.8% |
85 Command A Cohere • 08/2025 08/2025 | 11.7 | 3.2% |
Top Performers
#1
OpenAI78.9
OpenAI GPT-5.2
Success Rate
77.4%24
Tests Passed
Q
92
Quality
4
Issues
31 total tests
#2
Google78.5
Gemini 2.5 Pro
Success Rate
77.4%24
Tests Passed
Q
88
Quality
6
Issues
31 total tests
#3
OpenAI78.3
Openai Oss 120b
Success Rate
77.4%24
Tests Passed
Q
86
Quality
7
Issues
31 total tests
Explore More Benchmarks
See how models perform across different programming challenges and complexity levels.