Benchmark Detail View
Vending Machine System
MEDIUM Challenge100 models testedTop Score: 78.9
Success Rate
65.3%
Quality Score
81
Tests Passed
20
Models Tested
100
Vending Machine Benchmark - Individual Model Results
Showing 100 of 100 models
1 GPT 5.2 OpenAI • 12/2025 12/2025 | 78.9 | 77.4% |
2 GPT 5.3 Codex OpenAI • 02/2026 02/2026 | 78.7 | 77.4% |
3 GPT 5.4 OpenAI • 03/2026 03/2026 | 78.7 | 77.4% |
4 Gemini 2.5 Pro Google • 08/2025 08/2025 | 78.5 | 77.4% |
5 GPT 5.3 Chat OpenAI • 03/2026 03/2026 | 78.3 | 77.4% |
6 GPT 5 nano OpenAI • 08/2025 08/2025 | 78.3 | 77.4% |
7 OSS 120B OpenAI • 08/2025 08/2025 | 78.3 | 77.4% |
8 Claude Opus 4.7 Anthropic • 04/2026 04/2026 | 78.1 | 77.4% |
9 Gemini 3.1 Pro Preview Google • 02/2026 02/2026 | 78.1 | 77.4% |
10 GPT 4.1 nano OpenAI • 08/2025 08/2025 | 78.1 | 77.4% |
11 GPT 5.1 Codex OpenAI • 11/2025 11/2025 | 78.1 | 77.4% |
12 GPT 5.2 OpenAI • 12/2025 12/2025 | 78.1 | 77.4% |
13 Claude 3.5 Haiku Anthropic • 08/2025 08/2025 | 77.9 | 77.4% |
14 DeepSeek V3.2 Exp DeepSeek • 12/2025 12/2025 | 77.9 | 77.4% |
15 GLM 4.6 Z.AI • 10/2025 10/2025 | 77.9 | 77.4% |
16 GPT 4 OpenAI • 08/2025 08/2025 | 77.9 | 77.4% |
17 GPT 5.1 OpenAI • 11/2025 11/2025 | 77.9 | 77.4% |
18 DeepSeek R1 DeepSeek • 08/2025 08/2025 | 77.9 | 77.4% |
19 Claude 4.5 Opus Anthropic • 11/2025 11/2025 | 77.7 | 77.4% |
20 Claude 4.6 Opus Anthropic • 02/2026 02/2026 | 77.7 | 77.4% |
21 DeepSeek V3.2 Exp DeepSeek • 10/2025 10/2025 | 77.7 | 77.4% |
22 Gemini 2.5 Flash Google • 08/2025 08/2025 | 77.7 | 77.4% |
23 Gemini 3 Flash Preview Google • 12/2025 12/2025 | 77.7 | 77.4% |
24 Grok 4.1 Fast xAI • 02/2026 02/2026 | 77.7 | 77.4% |
25 Kimi K2 Thinking Moonshot AI • 12/2025 12/2025 | 77.7 | 77.4% |
26 Llama 4 Maverick Meta • 08/2025 08/2025 | 77.7 | 77.4% |
27 Llama 4 Scout Meta • 08/2025 08/2025 | 77.7 | 77.4% |
28 MiniMax M2.5 Minimax • 02/2026 02/2026 | 77.7 | 77.4% |
29 GPT 5.2 Codex OpenAI • 01/2026 01/2026 | 77.7 | 77.4% |
30 GPT 5 Codex OpenAI • 10/2025 10/2025 | 77.7 | 77.4% |
31 GPT 5 OpenAI • 09/2025 09/2025 | 77.7 | 77.4% |
32 o3 mini (High) OpenAI • 08/2025 08/2025 | 77.7 | 77.4% |
33 Qwen3 Coder Plus Qwen • 10/2025 10/2025 | 77.7 | 77.4% |
34 Qwen3 Max Qwen • 10/2025 10/2025 | 77.7 | 77.4% |
35 Claude 3.7 Sonnet Anthropic • 08/2025 08/2025 | 77.5 | 77.4% |
36 Claude 4.5 Sonnet Anthropic • 10/2025 10/2025 | 77.5 | 77.4% |
37 DeepSeek V3.2 Speciale DeepSeek • 02/2026 02/2026 | 77.5 | 77.4% |
38 Kimi K2.5 Moonshot AI • 02/2026 02/2026 | 77.5 | 77.4% |
39 GPT 4 Turbo OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
40 GPT 5.1 OpenAI • 11/2025 11/2025 | 77.5 | 77.4% |
41 GPT 5 OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
42 o3 mini OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
43 o4 mini (High) OpenAI • 08/2025 08/2025 | 77.5 | 77.4% |
44 Claude 3.7 Sonnet (Thinking) Anthropic • 08/2025 08/2025 | 77.3 | 77.4% |
45 Grok 3 xAI • 08/2025 08/2025 | 77.3 | 77.4% |
46 Grok 4 Fast xAI • 10/2025 10/2025 | 77.3 | 77.4% |
47 Horizon Beta Other • 08/2025 08/2025 | 77.3 | 77.4% |
48 Kimi K2 (0905) Moonshot AI • 10/2025 10/2025 | 77.3 | 77.4% |
49 GPT 5 OpenAI • 08/2025 08/2025 | 77.3 | 77.4% |
50 o4 mini OpenAI • 08/2025 08/2025 | 77.3 | 77.4% |
51 Step 3.5 Flash StepFun • 02/2026 02/2026 | 77.3 | 77.4% |
52 Claude 3.5 Sonnet Anthropic • 08/2025 08/2025 | 77.1 | 77.4% |
53 GPT 5.1 Codex Mini OpenAI • 11/2025 11/2025 | 77.1 | 77.4% |
54 GPT 5 nano OpenAI • 09/2025 09/2025 | 77.1 | 77.4% |
55 GPT 4o OpenAI • 08/2025 08/2025 | 77.1 | 77.4% |
56 Claude 4.6 Sonnet Anthropic • 02/2026 02/2026 | 76.9 | 77.4% |
57 GPT 4.1 OpenAI • 08/2025 08/2025 | 76.9 | 77.4% |
58 Nova 2 Lite V1 Amazon • 02/2026 02/2026 | 76.7 | 77.4% |
59 GPT 4.1 mini OpenAI • 08/2025 08/2025 | 76.7 | 77.4% |
60 o1 mini OpenAI • 08/2025 08/2025 | 76.7 | 77.4% |
61 GPT 5 mini OpenAI • 09/2025 09/2025 | 76.5 | 77.4% |
62 GPT 5 mini OpenAI • 08/2025 08/2025 | 76.3 | 77.4% |
63 GLM 5 Z.AI • 02/2026 02/2026 | 75.6 | 74.2% |
64 Codestral 25.08 Mistral • 08/2025 08/2025 | 75.4 | 74.2% |
65 Sonoma Sky Alpha Other • 09/2025 09/2025 | 75.4 | 74.2% |
66 Claude 4.1 Opus Anthropic • 08/2025 08/2025 | 75.0 | 74.2% |
67 Claude 4 Opus Anthropic • 08/2025 08/2025 | 75.0 | 74.2% |
68 Claude 4 Sonnet Anthropic • 08/2025 08/2025 | 75.0 | 74.2% |
69 Gemini 2.5 Flash Lite Google • 08/2025 08/2025 | 74.8 | 74.2% |
70 MiniMax M2.1 Minimax • 12/2025 12/2025 | 46.5 | 41.9% |
71 Trinity Large Preview Arcee AI • 02/2026 02/2026 | 46.5 | 41.9% |
72 Devstral 25.12 Mistral • 12/2025 12/2025 | 46.4 | 41.9% |
73 Mistral Large 25.12 Mistral • 12/2025 12/2025 | 46.4 | 41.9% |
74 Gemini 3 Pro Preview Google • 11/2025 11/2025 | 46.0 | 41.9% |
75 GLM 4.7 Z.AI • 12/2025 12/2025 | 46.0 | 41.9% |
76 Mistral Medium 3 Mistral • 08/2025 08/2025 | 46.0 | 41.9% |
77 Nova Pro V1 Amazon • 08/2025 08/2025 | 46.0 | 41.9% |
78 OSS 20B OpenAI • 08/2025 08/2025 | 46.0 | 41.9% |
79 Qwen3 Coder Qwen • 08/2025 08/2025 | 46.0 | 41.9% |
80 Coder Large Other • 08/2025 08/2025 | 45.8 | 41.9% |
81 Grok Code Fast 1 xAI • 09/2025 09/2025 | 45.8 | 41.9% |
82 Grok 4 xAI • 08/2025 08/2025 | 45.5 | 41.9% |
83 Kimi K2 Moonshot AI • 08/2025 08/2025 | 45.5 | 41.9% |
84 MIMO V2 Flash Minimax • 12/2025 12/2025 | 45.5 | 41.9% |
85 GLM 4.5 Z.AI • 08/2025 08/2025 | 45.4 | 41.9% |
86 GPT 4o OpenAI • 08/2025 08/2025 | 45.4 | 41.9% |
87 Qwen3 Coder Next Qwen • 02/2026 02/2026 | 45.4 | 41.9% |
88 Gemini 2.0 Flash 001 Google • 08/2025 08/2025 | 45.0 | 41.9% |
89 Grok 3 Mini xAI • 08/2025 08/2025 | 45.0 | 41.9% |
90 Claude 3 Haiku Anthropic • 08/2025 08/2025 | 43.6 | 38.7% |
91 Gemma 3 4B IT Google • 08/2025 08/2025 | 43.6 | 38.7% |
92 Magnum V4 72B NousResearch • 08/2025 08/2025 | 43.6 | 38.7% |
93 GPT 3.5 Turbo OpenAI • 08/2025 08/2025 | 43.6 | 38.7% |
94 GPT 4o mini OpenAI • 08/2025 08/2025 | 43.6 | 38.7% |
95 Qwen3 14B Qwen • 08/2025 08/2025 | 43.4 | 38.7% |
96 DeepSeek V3 DeepSeek • 08/2025 08/2025 | 43.2 | 38.7% |
97 Claude 4.5 Haiku Anthropic • 10/2025 10/2025 | 43.0 | 38.7% |
98 Nova Micro V1 Amazon • 08/2025 08/2025 | 37.4 | 32.3% |
99 Nova Lite V1 Amazon • 08/2025 08/2025 | 32.2 | 25.8% |
100 Command A Cohere • 08/2025 08/2025 | 11.7 | 3.2% |
Top Performers
#1
OpenAI78.9
GPT 5.2
Success Rate
77.4%24
Tests Passed
Q
92
Quality
4
Issues
31 total tests
#2
OpenAI78.7
GPT 5.3 Codex
Success Rate
77.4%24
Tests Passed
Q
90
Quality
5
Issues
31 total tests
#3
OpenAI78.7
GPT 5.4
Success Rate
77.4%24
Tests Passed
Q
90
Quality
5
Issues
31 total tests
Explore More Benchmarks
See how models perform across different programming challenges and complexity levels.