Benchmark Detail View

Vending Machine System

MEDIUM Challenge83 models testedTop Score: 78.9
Success Rate
63.7%
Quality Score
80
Tests Passed
20
Models Tested
83
Vending Machine Benchmark - Individual Model Results
Showing 83 of 83 models
1
OpenAI GPT-5.2
OpenAI12/2025
12/2025
78.977.4%
2
Gemini 2.5 Pro
Google08/2025
08/2025
78.577.4%
3
Openai Oss 120b
OpenAI08/2025
08/2025
78.377.4%
4
OpenAI GPT-5 nano
OpenAI08/2025
08/2025
78.377.4%
5
OpenAI 5.1 Codex
OpenAI11/2025
11/2025
78.177.4%
6
OpenAI GPT-4.1 nano
OpenAI08/2025
08/2025
78.177.4%
7
OpenAI GPT-5.2 Chat
OpenAI12/2025
12/2025
78.177.4%
8
R1
DeepSeek08/2025
08/2025
77.977.4%
9
OpenAI GPT-4
OpenAI08/2025
08/2025
77.977.4%
10
Glm 4 6
Other10/2025
10/2025
77.977.4%
11
OpenAI GPT-5.1
OpenAI11/2025
11/2025
77.977.4%
12
Claude 3.5 Haiku
Claude08/2025
08/2025
77.977.4%
13
DeepSeek V3
DeepSeek12/2025
12/2025
77.977.4%
14
Kimi K2
Moonshot12/2025
12/2025
77.777.4%
15
Llama 4 Maverick
Meta08/2025
08/2025
77.777.4%
16
OpenAI GPT-5
OpenAI09/2025
09/2025
77.777.4%
17
Gemini 3 Flash
Google12/2025
12/2025
77.777.4%
18
Qwen 3 Coder
Alibaba10/2025
10/2025
77.777.4%
19
Claude 4.5 Opus
Claude11/2025
11/2025
77.777.4%
20
OpenAI 5 Codex
OpenAI10/2025
10/2025
77.777.4%
21
Qwen3 Max
Alibaba10/2025
10/2025
77.777.4%
22
Llama 4 Scout
Meta08/2025
08/2025
77.777.4%
23
OpenAI o3-mini (High)
OpenAI08/2025
08/2025
77.777.4%
24
Gemini 2.5 Flash
Google08/2025
08/2025
77.777.4%
25
DeepSeek V3
DeepSeek10/2025
10/2025
77.777.4%
26
OpenAI o3-mini
OpenAI08/2025
08/2025
77.577.4%
27
OpenAI GPT-5 Chat
OpenAI08/2025
08/2025
77.577.4%
28
OpenAI GPT-4 Turbo
OpenAI08/2025
08/2025
77.577.4%
29
Claude 4.5 Sonnet
Claude10/2025
10/2025
77.577.4%
30
Claude 3.7 Sonnet
Claude08/2025
08/2025
77.577.4%
31
OpenAI GPT-5.1 Chat
OpenAI11/2025
11/2025
77.577.4%
32
OpenAI o4-mini (High)
OpenAI08/2025
08/2025
77.577.4%
33
Horizon Beta
Other08/2025
08/2025
77.377.4%
34
Grok 4
xAI10/2025
10/2025
77.377.4%
35
Claude 3.7 Sonnet (Thinking)
Claude08/2025
08/2025
77.377.4%
36
OpenAI GPT-5
OpenAI08/2025
08/2025
77.377.4%
37
Kimi K2
Moonshot10/2025
10/2025
77.377.4%
38
Grok 3
xAI08/2025
08/2025
77.377.4%
39
OpenAI o4-mini
OpenAI08/2025
08/2025
77.377.4%
40
OpenAI GPT-4o
OpenAI08/2025
08/2025
77.177.4%
41
Claude 3.5 Sonnet
Claude08/2025
08/2025
77.177.4%
42
OpenAI GPT-5 nano
OpenAI09/2025
09/2025
77.177.4%
43
OpenAI 5.1 Codex Mini
OpenAI11/2025
11/2025
77.177.4%
44
OpenAI GPT-4.1
OpenAI08/2025
08/2025
76.977.4%
45
OpenAI GPT-4.1 mini
OpenAI08/2025
08/2025
76.777.4%
46
OpenAI o1-mini
OpenAI08/2025
08/2025
76.777.4%
47
OpenAI GPT-5 mini
OpenAI09/2025
09/2025
76.577.4%
48
OpenAI GPT-5 mini
OpenAI08/2025
08/2025
76.377.4%
49
Codestral 25.08
Mistral08/2025
08/2025
75.474.2%
50
Sonoma Sky Alpha
Other09/2025
09/2025
75.474.2%
51
Claude 4 Sonnet
Claude08/2025
08/2025
75.074.2%
52
Claude 4.1 Opus
Claude08/2025
08/2025
75.074.2%
53
Claude 4 Opus
Claude08/2025
08/2025
75.074.2%
54
Gemini 2.5 Flash Lite
Google08/2025
08/2025
74.874.2%
55
Minimax M2 1
Other12/2025
12/2025
46.541.9%
56
Devstral 2512
Other12/2025
12/2025
46.441.9%
57
Mistral Large 2512
Mistral12/2025
12/2025
46.441.9%
58
Openai Oss 20b
OpenAI08/2025
08/2025
46.041.9%
59
Gemini 3 Pro Preview
Google11/2025
11/2025
46.041.9%
60
Glm 4 7
Other12/2025
12/2025
46.041.9%
61
Qwen 3 Coder
Alibaba08/2025
08/2025
46.041.9%
62
Nova Pro V1
Amazon08/2025
08/2025
46.041.9%
63
Mistral Medium 3
Mistral08/2025
08/2025
46.041.9%
64
Coder Large
Other08/2025
08/2025
45.841.9%
65
Grok Code Fast 1
xAI09/2025
09/2025
45.841.9%
66
Mimo V2 Flash Free
Other12/2025
12/2025
45.541.9%
67
Grok 4
xAI08/2025
08/2025
45.541.9%
68
Kimi K2
Moonshot08/2025
08/2025
45.541.9%
69
OpenAI GPT-4o
OpenAI08/2025
08/2025
45.441.9%
70
Glm 4 5
Other08/2025
08/2025
45.441.9%
71
Gemini 2.0 Flash-001
Google08/2025
08/2025
45.041.9%
72
Grok 3 Mini
xAI08/2025
08/2025
45.041.9%
73
Gemma 3 4B IT
Google08/2025
08/2025
43.638.7%
74
Claude 3 Haiku
Claude08/2025
08/2025
43.638.7%
75
OpenAI GPT-3.5 Turbo
OpenAI08/2025
08/2025
43.638.7%
76
OpenAI GPT-4o mini
OpenAI08/2025
08/2025
43.638.7%
77
Magnum V4 72B
NousResearch08/2025
08/2025
43.638.7%
78
Qwen3 14b
Alibaba08/2025
08/2025
43.438.7%
79
DeepSeek V3
DeepSeek08/2025
08/2025
43.238.7%
80
Claude 4.5 Haiku
Claude10/2025
10/2025
43.038.7%
81
Nova Micro V1
Amazon08/2025
08/2025
37.432.3%
82
Nova Lite V1
Amazon08/2025
08/2025
32.225.8%
83
Command A
Cohere08/2025
08/2025
11.73.2%

Top Performers

Vending Machine Champions
#1
OpenAI
78.9

OpenAI GPT-5.2

Success Rate
77.4%
24
Tests Passed
Q
92
Quality
4
Issues
31 total tests
#2
Google
78.5

Gemini 2.5 Pro

Success Rate
77.4%
24
Tests Passed
Q
88
Quality
6
Issues
31 total tests
#3
OpenAI
78.3

Openai Oss 120b

Success Rate
77.4%
24
Tests Passed
Q
86
Quality
7
Issues
31 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.