Benchmark Detail View

Vending Machine System

MEDIUM Challenge100 models testedTop Score: 78.9
Success Rate
65.3%
Quality Score
81
Tests Passed
20
Models Tested
100
Vending Machine Benchmark - Individual Model Results
Showing 100 of 100 models
1
GPT 5.2
OpenAI12/2025
12/2025
78.977.4%
2
GPT 5.3 Codex
OpenAI02/2026
02/2026
78.777.4%
3
GPT 5.4
OpenAI03/2026
03/2026
78.777.4%
4
Gemini 2.5 Pro
Google08/2025
08/2025
78.577.4%
5
GPT 5.3 Chat
OpenAI03/2026
03/2026
78.377.4%
6
GPT 5 nano
OpenAI08/2025
08/2025
78.377.4%
7
OSS 120B
OpenAI08/2025
08/2025
78.377.4%
8
Claude Opus 4.7
Anthropic04/2026
04/2026
78.177.4%
9
Gemini 3.1 Pro Preview
Google02/2026
02/2026
78.177.4%
10
GPT 4.1 nano
OpenAI08/2025
08/2025
78.177.4%
11
GPT 5.1 Codex
OpenAI11/2025
11/2025
78.177.4%
12
GPT 5.2
OpenAI12/2025
12/2025
78.177.4%
13
Claude 3.5 Haiku
Anthropic08/2025
08/2025
77.977.4%
14
DeepSeek V3.2 Exp
DeepSeek12/2025
12/2025
77.977.4%
15
GLM 4.6
Z.AI10/2025
10/2025
77.977.4%
16
GPT 4
OpenAI08/2025
08/2025
77.977.4%
17
GPT 5.1
OpenAI11/2025
11/2025
77.977.4%
18
DeepSeek R1
DeepSeek08/2025
08/2025
77.977.4%
19
Claude 4.5 Opus
Anthropic11/2025
11/2025
77.777.4%
20
Claude 4.6 Opus
Anthropic02/2026
02/2026
77.777.4%
21
DeepSeek V3.2 Exp
DeepSeek10/2025
10/2025
77.777.4%
22
Gemini 2.5 Flash
Google08/2025
08/2025
77.777.4%
23
Gemini 3 Flash Preview
Google12/2025
12/2025
77.777.4%
24
Grok 4.1 Fast
xAI02/2026
02/2026
77.777.4%
25
Kimi K2 Thinking
Moonshot AI12/2025
12/2025
77.777.4%
26
Llama 4 Maverick
Meta08/2025
08/2025
77.777.4%
27
Llama 4 Scout
Meta08/2025
08/2025
77.777.4%
28
MiniMax M2.5
Minimax02/2026
02/2026
77.777.4%
29
GPT 5.2 Codex
OpenAI01/2026
01/2026
77.777.4%
30
GPT 5 Codex
OpenAI10/2025
10/2025
77.777.4%
31
GPT 5
OpenAI09/2025
09/2025
77.777.4%
32
o3 mini (High)
OpenAI08/2025
08/2025
77.777.4%
33
Qwen3 Coder Plus
Qwen10/2025
10/2025
77.777.4%
34
Qwen3 Max
Qwen10/2025
10/2025
77.777.4%
35
Claude 3.7 Sonnet
Anthropic08/2025
08/2025
77.577.4%
36
Claude 4.5 Sonnet
Anthropic10/2025
10/2025
77.577.4%
37
DeepSeek V3.2 Speciale
DeepSeek02/2026
02/2026
77.577.4%
38
Kimi K2.5
Moonshot AI02/2026
02/2026
77.577.4%
39
GPT 4 Turbo
OpenAI08/2025
08/2025
77.577.4%
40
GPT 5.1
OpenAI11/2025
11/2025
77.577.4%
41
GPT 5
OpenAI08/2025
08/2025
77.577.4%
42
o3 mini
OpenAI08/2025
08/2025
77.577.4%
43
o4 mini (High)
OpenAI08/2025
08/2025
77.577.4%
44
Claude 3.7 Sonnet (Thinking)
Anthropic08/2025
08/2025
77.377.4%
45
Grok 3
xAI08/2025
08/2025
77.377.4%
46
Grok 4 Fast
xAI10/2025
10/2025
77.377.4%
47
Horizon Beta
Other08/2025
08/2025
77.377.4%
48
Kimi K2 (0905)
Moonshot AI10/2025
10/2025
77.377.4%
49
GPT 5
OpenAI08/2025
08/2025
77.377.4%
50
o4 mini
OpenAI08/2025
08/2025
77.377.4%
51
Step 3.5 Flash
StepFun02/2026
02/2026
77.377.4%
52
Claude 3.5 Sonnet
Anthropic08/2025
08/2025
77.177.4%
53
GPT 5.1 Codex Mini
OpenAI11/2025
11/2025
77.177.4%
54
GPT 5 nano
OpenAI09/2025
09/2025
77.177.4%
55
GPT 4o
OpenAI08/2025
08/2025
77.177.4%
56
Claude 4.6 Sonnet
Anthropic02/2026
02/2026
76.977.4%
57
GPT 4.1
OpenAI08/2025
08/2025
76.977.4%
58
Nova 2 Lite V1
Amazon02/2026
02/2026
76.777.4%
59
GPT 4.1 mini
OpenAI08/2025
08/2025
76.777.4%
60
o1 mini
OpenAI08/2025
08/2025
76.777.4%
61
GPT 5 mini
OpenAI09/2025
09/2025
76.577.4%
62
GPT 5 mini
OpenAI08/2025
08/2025
76.377.4%
63
GLM 5
Z.AI02/2026
02/2026
75.674.2%
64
Codestral 25.08
Mistral08/2025
08/2025
75.474.2%
65
Sonoma Sky Alpha
Other09/2025
09/2025
75.474.2%
66
Claude 4.1 Opus
Anthropic08/2025
08/2025
75.074.2%
67
Claude 4 Opus
Anthropic08/2025
08/2025
75.074.2%
68
Claude 4 Sonnet
Anthropic08/2025
08/2025
75.074.2%
69
Gemini 2.5 Flash Lite
Google08/2025
08/2025
74.874.2%
70
MiniMax M2.1
Minimax12/2025
12/2025
46.541.9%
71
Trinity Large Preview
Arcee AI02/2026
02/2026
46.541.9%
72
Devstral 25.12
Mistral12/2025
12/2025
46.441.9%
73
Mistral Large 25.12
Mistral12/2025
12/2025
46.441.9%
74
Gemini 3 Pro Preview
Google11/2025
11/2025
46.041.9%
75
GLM 4.7
Z.AI12/2025
12/2025
46.041.9%
76
Mistral Medium 3
Mistral08/2025
08/2025
46.041.9%
77
Nova Pro V1
Amazon08/2025
08/2025
46.041.9%
78
OSS 20B
OpenAI08/2025
08/2025
46.041.9%
79
Qwen3 Coder
Qwen08/2025
08/2025
46.041.9%
80
Coder Large
Other08/2025
08/2025
45.841.9%
81
Grok Code Fast 1
xAI09/2025
09/2025
45.841.9%
82
Grok 4
xAI08/2025
08/2025
45.541.9%
83
Kimi K2
Moonshot AI08/2025
08/2025
45.541.9%
84
MIMO V2 Flash
Minimax12/2025
12/2025
45.541.9%
85
GLM 4.5
Z.AI08/2025
08/2025
45.441.9%
86
GPT 4o
OpenAI08/2025
08/2025
45.441.9%
87
Qwen3 Coder Next
Qwen02/2026
02/2026
45.441.9%
88
Gemini 2.0 Flash 001
Google08/2025
08/2025
45.041.9%
89
Grok 3 Mini
xAI08/2025
08/2025
45.041.9%
90
Claude 3 Haiku
Anthropic08/2025
08/2025
43.638.7%
91
Gemma 3 4B IT
Google08/2025
08/2025
43.638.7%
92
Magnum V4 72B
NousResearch08/2025
08/2025
43.638.7%
93
GPT 3.5 Turbo
OpenAI08/2025
08/2025
43.638.7%
94
GPT 4o mini
OpenAI08/2025
08/2025
43.638.7%
95
Qwen3 14B
Qwen08/2025
08/2025
43.438.7%
96
DeepSeek V3
DeepSeek08/2025
08/2025
43.238.7%
97
Claude 4.5 Haiku
Anthropic10/2025
10/2025
43.038.7%
98
Nova Micro V1
Amazon08/2025
08/2025
37.432.3%
99
Nova Lite V1
Amazon08/2025
08/2025
32.225.8%
100
Command A
Cohere08/2025
08/2025
11.73.2%

Top Performers

Vending Machine Champions
#1
OpenAI
78.9

GPT 5.2

Success Rate
77.4%
24
Tests Passed
Q
92
Quality
4
Issues
31 total tests
#2
OpenAI
78.7

GPT 5.3 Codex

Success Rate
77.4%
24
Tests Passed
Q
90
Quality
5
Issues
31 total tests
#3
OpenAI
78.7

GPT 5.4

Success Rate
77.4%
24
Tests Passed
Q
90
Quality
5
Issues
31 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.