Benchmark Detail View

School Library Management

MEDIUM Challenge95 models testedTop Score: 80.1
Success Rate
59.1%
Quality Score
82
Tests Passed
17
Models Tested
95
School Library Benchmark - Individual Model Results
Showing 95 of 95 models
1
DeepSeek V3.2 Speciale
DeepSeek02/2026
02/2026
80.178.6%
2
Claude 3 Haiku
Anthropic08/2025
08/2025
79.978.6%
3
GLM 5
Z.AI02/2026
02/2026
79.778.6%
4
Grok 4
xAI08/2025
08/2025
79.578.6%
5
Horizon Beta
Other08/2025
08/2025
78.178.6%
6
GPT 5.2
OpenAI12/2025
12/2025
76.775.0%
7
Claude 4.6 Opus
Anthropic02/2026
02/2026
76.575.0%
8
GPT 5.1
OpenAI11/2025
11/2025
76.375.0%
9
GPT 5.3 Chat
OpenAI03/2026
03/2026
76.375.0%
10
Claude 4.5 Sonnet
Anthropic10/2025
10/2025
76.175.0%
11
o4 mini
OpenAI08/2025
08/2025
76.175.0%
12
OSS 120B
OpenAI08/2025
08/2025
76.175.0%
13
DeepSeek R1
DeepSeek08/2025
08/2025
76.175.0%
14
o1 mini
OpenAI08/2025
08/2025
75.975.0%
15
Grok 3 Mini
xAI08/2025
08/2025
75.775.0%
16
o3 mini (High)
OpenAI08/2025
08/2025
75.575.0%
17
o3 mini
OpenAI08/2025
08/2025
75.575.0%
18
GPT 4o
OpenAI08/2025
08/2025
75.375.0%
19
GLM 4.7
Z.AI12/2025
12/2025
74.775.0%
20
GPT 5
OpenAI09/2025
09/2025
74.575.0%
21
o4 mini (High)
OpenAI08/2025
08/2025
74.575.0%
22
GPT 4.1
OpenAI08/2025
08/2025
74.375.0%
23
Nova Pro V1
Amazon08/2025
08/2025
73.371.4%
24
Mistral Medium 3
Mistral08/2025
08/2025
73.171.4%
25
GPT 5 mini
OpenAI09/2025
09/2025
72.975.0%
26
Claude 4 Sonnet
Anthropic08/2025
08/2025
72.771.4%
27
GPT 5.1 Codex
OpenAI11/2025
11/2025
72.771.4%
28
Kimi K2 Thinking
Moonshot AI12/2025
12/2025
72.571.4%
29
MIMO V2 Flash
Minimax12/2025
12/2025
72.571.4%
30
GPT 4.1 nano
OpenAI08/2025
08/2025
71.971.4%
31
Sonoma Sky Alpha
Other09/2025
09/2025
71.371.4%
32
Nova Lite V1
Amazon08/2025
08/2025
70.967.9%
33
GPT 4o mini
OpenAI08/2025
08/2025
70.771.4%
34
Trinity Large Preview
Arcee AI02/2026
02/2026
70.367.9%
35
GPT 5.1 Codex Mini
OpenAI11/2025
11/2025
70.167.9%
36
GPT 5 mini
OpenAI08/2025
08/2025
69.371.4%
37
Step 3.5 Flash
StepFun02/2026
02/2026
68.967.9%
38
Coder Large
Other08/2025
08/2025
66.964.3%
39
Grok Code Fast 1
xAI09/2025
09/2025
66.564.3%
40
Nova Micro V1
Amazon08/2025
08/2025
66.364.3%
41
Gemini 3 Flash Preview
Google12/2025
12/2025
64.260.7%
42
Gemini 3 Pro Preview
Google11/2025
11/2025
64.060.7%
43
MiniMax M2.1
Minimax12/2025
12/2025
63.660.7%
44
GPT 5.2
OpenAI12/2025
12/2025
63.460.7%
45
Gemini 2.5 Flash Lite
Google08/2025
08/2025
62.260.7%
46
OSS 20B
OpenAI08/2025
08/2025
62.260.7%
47
Grok 3
xAI08/2025
08/2025
60.857.1%
48
Claude 4.6 Sonnet
Anthropic02/2026
02/2026
60.257.1%
49
Claude 4.5 Opus
Anthropic11/2025
11/2025
59.857.1%
50
GPT 5.3 Codex
OpenAI02/2026
02/2026
59.257.1%
51
GPT 5 Codex
OpenAI10/2025
10/2025
59.257.1%
52
DeepSeek V3
DeepSeek08/2025
08/2025
57.853.6%
53
DeepSeek V3.2 Exp
DeepSeek12/2025
12/2025
57.453.6%
54
GPT 5.2 Codex
OpenAI01/2026
01/2026
57.453.6%
55
Mistral Large 25.12
Mistral12/2025
12/2025
57.253.6%
56
Llama 4 Scout
Meta08/2025
08/2025
56.853.6%
57
Claude 4.1 Opus
Anthropic08/2025
08/2025
56.653.6%
58
Gemini 2.5 Flash
Google08/2025
08/2025
56.653.6%
59
Qwen3 Coder Plus
Qwen10/2025
10/2025
56.653.6%
60
Gemini 2.0 Flash 001
Google08/2025
08/2025
56.253.6%
61
GPT 4
OpenAI08/2025
08/2025
55.853.6%
62
Qwen3 14B
Qwen08/2025
08/2025
55.853.6%
63
Gemini 2.5 Pro
Google08/2025
08/2025
53.850.0%
64
Kimi K2
Moonshot AI08/2025
08/2025
53.850.0%
65
Kimi K2.5
Moonshot AI02/2026
02/2026
53.653.6%
66
Qwen3 Coder
Qwen08/2025
08/2025
53.250.0%
67
Claude 4 Opus
Anthropic08/2025
08/2025
53.050.0%
68
Qwen3 Coder Next
Qwen02/2026
02/2026
52.850.0%
69
DeepSeek V3.2 Exp
DeepSeek10/2025
10/2025
52.650.0%
70
Claude 3.7 Sonnet
Anthropic08/2025
08/2025
51.246.4%
71
Claude 3.7 Sonnet (Thinking)
Anthropic08/2025
08/2025
51.246.4%
72
Claude 3.5 Sonnet
Anthropic08/2025
08/2025
51.046.4%
73
Qwen3 Max
Qwen10/2025
10/2025
50.846.4%
74
GLM 4.5
Z.AI08/2025
08/2025
50.646.4%
75
GPT 4 Turbo
OpenAI08/2025
08/2025
50.646.4%
76
Gemini 3.1 Pro Preview
Google02/2026
02/2026
50.450.0%
77
Kimi K2 (0905)
Moonshot AI10/2025
10/2025
50.246.4%
78
Claude 3.5 Haiku
Anthropic08/2025
08/2025
50.046.4%
79
GLM 4.6
Z.AI10/2025
10/2025
49.646.4%
80
Claude 4.5 Haiku
Anthropic10/2025
10/2025
49.246.4%
81
Grok 4 Fast
xAI10/2025
10/2025
49.046.4%
82
MiniMax M2.5
Minimax02/2026
02/2026
48.446.4%
83
GPT 4.1 mini
OpenAI08/2025
08/2025
48.246.4%
84
Llama 4 Maverick
Meta08/2025
08/2025
47.842.9%
85
Codestral 25.08
Mistral08/2025
08/2025
47.242.9%
86
GPT 5.4
OpenAI03/2026
03/2026
47.042.9%
87
Grok 4.1 Fast
xAI02/2026
02/2026
46.842.9%
88
GPT 5 nano
OpenAI08/2025
08/2025
46.042.9%
89
GPT 4o
OpenAI08/2025
08/2025
44.639.3%
90
GPT 5
OpenAI08/2025
08/2025
44.439.3%
91
Devstral 25.12
Mistral12/2025
12/2025
44.039.3%
92
GPT 3.5 Turbo
OpenAI08/2025
08/2025
41.535.7%
93
GPT 5
OpenAI08/2025
08/2025
41.439.3%
94
GPT 5.1
OpenAI11/2025
11/2025
31.532.1%
95
Command A
Cohere08/2025
08/2025
13.03.6%

Top Performers

School Library Champions
#1
DeepSeek
80.1

DeepSeek V3.2 Speciale

Success Rate
78.6%
22
Tests Passed
Q
94
Quality
3
Issues
28 total tests
#2
Anthropic
79.9

Claude 3 Haiku

Success Rate
78.6%
22
Tests Passed
Q
92
Quality
4
Issues
28 total tests
#3
Z.AI
79.7

GLM 5

Success Rate
78.6%
22
Tests Passed
Q
90
Quality
5
Issues
28 total tests

Explore More Benchmarks

See how models perform across different programming challenges and complexity levels.