Benchmark Detail View
School Library Management
MEDIUM Challenge70 models testedTop Score: 79.9
Success Rate
58.5%
Quality Score
81
Tests Passed
16
Models Tested
70
School Library Benchmark - Individual Model Results
Showing 70 of 70 models
1 Claude 3 Haiku Claude • 08/2025 08/2025 | 79.9 | 78.6% |
2 Grok 4 xAI • 08/2025 08/2025 | 79.5 | 78.6% |
3 Horizon Beta Other • 08/2025 08/2025 | 78.1 | 78.6% |
4 OpenAI GPT-5.1 Chat OpenAI • 11/2025 11/2025 | 76.3 | 75.0% |
5 OpenAI o4-mini OpenAI • 08/2025 08/2025 | 76.1 | 75.0% |
6 R1 DeepSeek • 08/2025 08/2025 | 76.1 | 75.0% |
7 Openai Oss 120b OpenAI • 08/2025 08/2025 | 76.1 | 75.0% |
8 Claude 4.5 Sonnet Claude • 10/2025 10/2025 | 76.1 | 75.0% |
9 OpenAI o1-mini OpenAI • 08/2025 08/2025 | 75.9 | 75.0% |
10 Grok 3 Mini xAI • 08/2025 08/2025 | 75.7 | 75.0% |
11 OpenAI o3-mini (High) OpenAI • 08/2025 08/2025 | 75.5 | 75.0% |
12 OpenAI o3-mini OpenAI • 08/2025 08/2025 | 75.5 | 75.0% |
13 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 75.3 | 75.0% |
14 OpenAI o4-mini (High) OpenAI • 08/2025 08/2025 | 74.5 | 75.0% |
15 OpenAI GPT-5 OpenAI • 09/2025 09/2025 | 74.5 | 75.0% |
16 OpenAI GPT-4.1 OpenAI • 08/2025 08/2025 | 74.3 | 75.0% |
17 Nova Pro V1 Amazon • 08/2025 08/2025 | 73.3 | 71.4% |
18 Mistral Medium 3 Mistral • 08/2025 08/2025 | 73.1 | 71.4% |
19 OpenAI GPT-5 mini OpenAI • 09/2025 09/2025 | 72.9 | 75.0% |
20 Claude 4 Sonnet Claude • 08/2025 08/2025 | 72.7 | 71.4% |
21 OpenAI 5.1 Codex OpenAI • 11/2025 11/2025 | 72.7 | 71.4% |
22 OpenAI GPT-4.1 nano OpenAI • 08/2025 08/2025 | 71.9 | 71.4% |
23 Sonoma Sky Alpha Other • 09/2025 09/2025 | 71.3 | 71.4% |
24 Nova Lite V1 Amazon • 08/2025 08/2025 | 70.9 | 67.9% |
25 OpenAI GPT-4o mini OpenAI • 08/2025 08/2025 | 70.7 | 71.4% |
26 OpenAI 5.1 Codex Mini OpenAI • 11/2025 11/2025 | 70.1 | 67.9% |
27 OpenAI GPT-5 mini OpenAI • 08/2025 08/2025 | 69.3 | 71.4% |
28 Coder Large Other • 08/2025 08/2025 | 66.9 | 64.3% |
29 Grok Code Fast 1 xAI • 09/2025 09/2025 | 66.5 | 64.3% |
30 Nova Micro V1 Amazon • 08/2025 08/2025 | 66.3 | 64.3% |
31 Gemini 3 Pro Preview Google • 11/2025 11/2025 | 64.0 | 60.7% |
32 Openai Oss 20b OpenAI • 08/2025 08/2025 | 62.2 | 60.7% |
33 Gemini 2.5 Flash Lite Google • 08/2025 08/2025 | 62.2 | 60.7% |
34 Grok 3 xAI • 08/2025 08/2025 | 60.8 | 57.1% |
35 Claude 4.5 Opus Claude • 11/2025 11/2025 | 59.8 | 57.1% |
36 OpenAI 5 Codex OpenAI • 10/2025 10/2025 | 59.2 | 57.1% |
37 DeepSeek V3 DeepSeek • 08/2025 08/2025 | 57.8 | 53.6% |
38 Llama 4 Scout Meta • 08/2025 08/2025 | 56.8 | 53.6% |
39 Gemini 2.5 Flash Google • 08/2025 08/2025 | 56.6 | 53.6% |
40 Claude 4.1 Opus Claude • 08/2025 08/2025 | 56.6 | 53.6% |
41 Qwen 3 Coder Alibaba • 10/2025 10/2025 | 56.6 | 53.6% |
42 Gemini 2.0 Flash-001 Google • 08/2025 08/2025 | 56.2 | 53.6% |
43 OpenAI GPT-4 OpenAI • 08/2025 08/2025 | 55.8 | 53.6% |
44 Qwen3 14b Alibaba • 08/2025 08/2025 | 55.8 | 53.6% |
45 Gemini 2.5 Pro Google • 08/2025 08/2025 | 53.8 | 50.0% |
46 Kimi K2 Moonshot • 08/2025 08/2025 | 53.8 | 50.0% |
47 Qwen 3 Coder Alibaba • 08/2025 08/2025 | 53.2 | 50.0% |
48 Claude 4 Opus Claude • 08/2025 08/2025 | 53.0 | 50.0% |
49 DeepSeek V3 DeepSeek • 10/2025 10/2025 | 52.6 | 50.0% |
50 Claude 3.7 Sonnet Claude • 08/2025 08/2025 | 51.2 | 46.4% |
51 Claude 3.7 Sonnet (Thinking) Claude • 08/2025 08/2025 | 51.2 | 46.4% |
52 Claude 3.5 Sonnet Claude • 08/2025 08/2025 | 51.0 | 46.4% |
53 Qwen3 Max Alibaba • 10/2025 10/2025 | 50.8 | 46.4% |
54 OpenAI GPT-4 Turbo OpenAI • 08/2025 08/2025 | 50.6 | 46.4% |
55 Glm 4 5 Other • 08/2025 08/2025 | 50.6 | 46.4% |
56 Kimi K2 Moonshot • 10/2025 10/2025 | 50.2 | 46.4% |
57 Claude 3.5 Haiku Claude • 08/2025 08/2025 | 50.0 | 46.4% |
58 Glm 4 6 Other • 10/2025 10/2025 | 49.6 | 46.4% |
59 Claude 4.5 Haiku Claude • 10/2025 10/2025 | 49.2 | 46.4% |
60 Grok 4 xAI • 10/2025 10/2025 | 49.0 | 46.4% |
61 OpenAI GPT-4.1 mini OpenAI • 08/2025 08/2025 | 48.2 | 46.4% |
62 Llama 4 Maverick Meta • 08/2025 08/2025 | 47.8 | 42.9% |
63 Codestral 25.08 Mistral • 08/2025 08/2025 | 47.2 | 42.9% |
64 OpenAI GPT-5 nano OpenAI • 08/2025 08/2025 | 46.0 | 42.9% |
65 OpenAI GPT-4o OpenAI • 08/2025 08/2025 | 44.6 | 39.3% |
66 OpenAI GPT-5 Chat OpenAI • 08/2025 08/2025 | 44.4 | 39.3% |
67 OpenAI GPT-3.5 Turbo OpenAI • 08/2025 08/2025 | 41.5 | 35.7% |
68 OpenAI GPT-5 OpenAI • 08/2025 08/2025 | 41.4 | 39.3% |
69 OpenAI GPT-5.1 OpenAI • 11/2025 11/2025 | 31.5 | 32.1% |
70 Command A Cohere • 08/2025 08/2025 | 13.0 | 3.6% |
Top Performers
#1
Claude79.9
Claude 3 Haiku
Success Rate
78.6%22
Tests Passed
Q
92
Quality
4
Issues
28 total tests
#2
xAI79.5
Grok 4
Success Rate
78.6%22
Tests Passed
Q
88
Quality
6
Issues
28 total tests
#3
Other78.1
Horizon Beta
Success Rate
78.6%22
Tests Passed
Q
74
Quality
13
Issues
28 total tests
Explore More Benchmarks
See how models perform across different programming challenges and complexity levels.