Benchmark Detail View

Vending Machine System

MEDIUM Challenge45 models testedTop Score: 78.5
Success Rate
60.2%
Quality Score
80
Tests Passed
19
Models Tested
45
Vending Machine Benchmark - Individual Model Results
Showing 45 of 45 models
1
Gemini 2.5 Pro
Google08/2025
08/2025
78.577.4%
2
OpenAI GPT-4.1 nano
OpenAI08/2025
08/2025
78.177.4%
3
Claude 3.5 Haiku
Claude08/2025
08/2025
77.977.4%
4
OpenAI GPT-4
OpenAI08/2025
08/2025
77.977.4%
5
R1
DeepSeek08/2025
08/2025
77.977.4%
6
Llama 4 Maverick
Meta08/2025
08/2025
77.777.4%
7
Llama 4 Scout
Meta08/2025
08/2025
77.777.4%
8
OpenAI o3-mini (High)
OpenAI08/2025
08/2025
77.777.4%
9
Gemini 2.5 Flash
Google08/2025
08/2025
77.777.4%
10
Claude 3.7 Sonnet
Claude08/2025
08/2025
77.577.4%
11
OpenAI o3-mini
OpenAI08/2025
08/2025
77.577.4%
12
OpenAI GPT-4 Turbo
OpenAI08/2025
08/2025
77.577.4%
13
OpenAI o4-mini (High)
OpenAI08/2025
08/2025
77.577.4%
14
Grok 3
xAI08/2025
08/2025
77.377.4%
15
OpenAI o4-mini
OpenAI08/2025
08/2025
77.377.4%
16
Claude 3.7 Sonnet (Thinking)
Claude08/2025
08/2025
77.377.4%
17
Horizon Beta
Other08/2025
08/2025
77.377.4%
18
Claude 3.5 Sonnet
Claude08/2025
08/2025
77.177.4%
19
OpenAI GPT-4o
OpenAI08/2025
08/2025
77.177.4%
20
OpenAI GPT-4.1
OpenAI08/2025
08/2025
76.977.4%
21
OpenAI o1-mini
OpenAI08/2025
08/2025
76.777.4%
22
OpenAI GPT-4.1 mini
OpenAI08/2025
08/2025
76.777.4%
23
Codestral 25.08
Mistral08/2025
08/2025
75.474.2%
24
Claude 4 Opus
Claude08/2025
08/2025
75.074.2%
25
Claude 4 Sonnet
Claude08/2025
08/2025
75.074.2%
26
Gemini 2.5 Flash Lite
Google08/2025
08/2025
74.874.2%
27
Nova Pro V1
Amazon08/2025
08/2025
46.041.9%
28
Mistral Medium 3
Mistral08/2025
08/2025
46.041.9%
29
Qwen 3 Coder
Alibaba08/2025
08/2025
46.041.9%
30
Coder Large
Other08/2025
08/2025
45.841.9%
31
Grok 4
xAI08/2025
08/2025
45.541.9%
32
Kimi K2
Moonshot08/2025
08/2025
45.541.9%
33
OpenAI GPT-4o
OpenAI08/2025
08/2025
45.441.9%
34
Grok 3 Mini
xAI08/2025
08/2025
45.041.9%
35
Gemini 2.0 Flash-001
Google08/2025
08/2025
45.041.9%
36
Gemma 3 4B IT
Google08/2025
08/2025
43.638.7%
37
OpenAI GPT-3.5 Turbo
OpenAI08/2025
08/2025
43.638.7%
38
OpenAI GPT-4o mini
OpenAI08/2025
08/2025
43.638.7%
39
Claude 3 Haiku
Claude08/2025
08/2025
43.638.7%
40
Magnum V4 72B
NousResearch08/2025
08/2025
43.638.7%
41
Qwen3 14b
Alibaba08/2025
08/2025
43.438.7%
42
DeepSeek V3
DeepSeek08/2025
08/2025
43.238.7%
43
Nova Micro V1
Amazon08/2025
08/2025
37.432.3%
44
Nova Lite V1
Amazon08/2025
08/2025
32.225.8%
45
Command A
Cohere08/2025
08/2025
11.73.2%

Top Performers

Vending Machine Champions
#1
Google
78.5

Gemini 2.5 Pro

Success Rate
77.4%
24
Tests Passed
Q
88
Quality
6
Issues
31 total tests
#2
OpenAI
78.1

OpenAI GPT-4.1 nano

Success Rate
77.4%
24
Tests Passed
Q
84
Quality
8
Issues
31 total tests
#3
Claude
77.9

Claude 3.5 Haiku

Success Rate
77.4%
24
Tests Passed
Q
82
Quality
9
Issues
31 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.