当前位置:首页>排行榜>最新AI大语言模型排行榜(2026/2/4)

最新AI大语言模型排行榜(2026/2/4)

  • 更新时间 2026-02-04 01:18:16
最新AI大语言模型排行榜(2026/2/4)
全球排行榜:
模型名称
机构
全局平均
推理
编程
代理编程
数学
数据分析
语言
指令遵循
Claude 4.5 Opus Thinking High Effort
Anthropic
75.96
80.09
79.65
63.33
90.39
74.44
81.26
62.55
GPT-5.2 High
OpenAI
74.84
83.21
76.07
51.67
93.17
78.16
79.81
61.77
GPT-5.2 Codex
OpenAI
74.30
77.71
83.62
51.67
88.77
78.20
73.68
66.45
GPT-5.1 Codex Max High
OpenAI
73.98
83.65
80.68
53.33
83.22
70.12
76.48
70.38
Gemini 3 Pro Preview High
Google
73.39
77.42
74.60
55.00
81.84
74.39
84.62
65.85
Gemini 3 Flash Preview High
Google
72.40
74.55
73.90
40.00
84.17
74.77
84.56
74.86
GPT-5.1 High
OpenAI
72.04
78.79
72.49
53.33
86.90
69.61
79.26
63.90
GPT-5 Pro
OpenAI
70.48
81.69
72.11
51.67
86.17
57.04
80.69
63.96
Kimi K2.5 Thinking
Moonshot AI
69.07
75.96
77.86
48.33
84.87
61.36
77.67
57.41
GPT-5.1 Codex
OpenAI
68.61
81.98
71.78
53.33
79.58
60.75
69.48
63.39
Claude Sonnet 4.5 Thinking
Anthropic
68.19
77.59
80.36
53.33
79.31
56.97
76.45
53.35
GPT-5 Mini High
OpenAI
65.91
68.32
68.20
46.67
82.20
55.20
75.52
65.27
DeepSeek V3.2 Thinking
DeepSeek
62.20
77.17
64.62
40.00
85.03
50.00
70.41
48.19
Grok 4
xAI
62.02
79.13
73.13
30.00
83.02
63.38
76.39
29.07
Claude 4.1 Opus Thinking
Anthropic
61.81
72.33
74.66
48.33
73.19
48.98
72.76
42.40
Kimi K2 Thinking
Moonshot AI
61.59
63.49
67.44
38.33
81.10
52.29
66.45
62.03
Claude Haiku 4.5 Thinking
Anthropic
61.32
61.68
72.81
41.67
77.53
59.30
66.45
49.78
Claude 4 Sonnet Thinking
Anthropic
61.27
69.01
77.48
40.00
70.50
54.63
72.91
44.34
GPT-5.1 Codex Mini
OpenAI
60.38
64.71
69.93
40.00
76.26
49.70
63.01
59.02
Grok 4.1 Fast
xAI
59.99
80.20
69.61
31.67
83.72
52.24
74.33
28.20
Claude 4.5 Opus Medium Effort
Anthropic
59.10
53.21
78.51
63.33
66.32
45.54
78.66
28.11
DeepSeek V3.2 Exp Thinking
DeepSeek
58.90
64.37
70.06
31.67
82.40
51.50
71.06
41.27
Gemini 2.5 Pro (Max Thinking)
Google
58.33
70.81
75.69
33.33
68.32
51.62
75.50
33.07
GLM 4.7
Z.AI
58.09
59.73
73.13
41.67
76.02
55.17
65.23
35.66
GLM 4.6
Z.AI
55.19
62.06
71.02
35.00
81.13
51.95
58.99
26.19
Claude 4.1 Opus
Anthropic
54.45
40.89
76.07
53.33
62.83
45.38
76.75
25.92
Claude Sonnet 4.5
Anthropic
53.69
42.29
76.07
48.33
62.62
47.00
76.00
23.52
Gemini 2.5 Flash (Max Thinking) (2025-09-25)
Google
53.09
51.45
67.50
23.33
75.35
60.98
65.34
27.68
Qwen 3 235B A22B Thinking 2507
Alibaba
52.97
59.40
68.97
6.67
73.39
52.18
69.52
40.64
DeepSeek V3.2
DeepSeek
51.84
44.25
75.69
46.67
63.95
45.03
64.24
23.06
Claude 4 Sonnet
Anthropic
50.98
39.67
80.74
38.33
60.36
44.07
71.01
22.68
Qwen 3 Next 80B A3B Thinking
Alibaba
50.41
58.16
60.66
8.33
74.26
53.58
56.31
41.54
DeepSeek V3.2 Exp
DeepSeek
49.85
45.50
73.19
36.67
64.38
44.26
65.60
19.33
GPT-5.2 No Thinking
OpenAI
48.91
42.80
76.45
40.00
58.25
47.68
49.97
27.20
Qwen 3 235B A22B Instruct 2507
Alibaba
48.84
58.43
69.61
13.33
68.03
44.72
66.07
21.72
GPT-5 Nano High
OpenAI
48.62
40.29
62.39
23.33
68.41
43.41
46.84
55.70
Qwen 3 Next 80B A3B Instruct
Alibaba
48.35
54.75
68.20
10.00
70.18
49.78
66.34
19.19
Kimi K2 Instruct
Moonshot AI
48.10
42.23
74.28
31.67
58.15
43.34
66.69
20.36
Gemini 2.5 Flash (Max Thinking) (2025-06-05)
Google
47.74
44.64
66.03
16.67
68.75
47.31
62.27
28.50
GPT OSS 120b
OpenAI
46.09
39.21
60.21
16.67
68.87
38.80
48.59
50.29
Claude Haiku 4.5
Anthropic
45.33
33.94
72.17
33.33
57.97
45.13
57.05
17.75
Grok Code Fast
xAI
45.13
42.30
64.44
33.33
56.01
48.99
48.56
22.27
Qwen 3 32B
Alibaba
43.56
48.25
66.03
3.33
67.44
46.54
55.54
17.77
GPT-5.1 No Thinking
OpenAI
42.65
26.81
77.48
28.33
44.51
44.07
53.84
23.50
Gemini 2.5 Flash Lite (Max Thinking) (2025-06-17)
Google
42.56
43.34
66.41
5.00
61.04
47.04
51.98
23.08
Gemini 2.5 Flash Lite (Max Thinking) (2025-09-25)
Google
42.39
36.16
65.39
1.67
64.90
47.88
52.60
28.11
Devstral 2
Mistral
41.24
27.74
66.79
43.33
52.52
39.14
45.67
13.50
GLM 4.6V
Z.AI
40.07
37.22
64.24
3.33
62.50
46.41
49.74
17.06
Qwen 3 30B A3B
Alibaba
39.01
36.68
48.88
1.67
65.35
44.92
54.47
21.11
Grok 4.1 Fast (Non-Reasoning)
xAI
33.45
23.35
54.26
10.00
38.92
40.61
50.01
16.98
Trinity Large Preview
Arcee
32.74
20.61
65.65
3.33
44.93
40.33
42.15
12.19
开源排行榜:
模型名称
机构
全局平均
推理
编程
代理编程
数学
数据分析
语言
指令遵循
Kimi K2.5 Thinking
Moonshot AI
69.07
75.96
77.86
48.33
84.87
61.36
77.67
57.41
DeepSeek V3.2 Thinking
DeepSeek
62.20
77.17
64.62
40.00
85.03
50.00
70.41
48.19
Kimi K2 Thinking
Moonshot AI
61.59
63.49
67.44
38.33
81.10
52.29
66.45
62.03
DeepSeek V3.2 Exp Thinking
DeepSeek
58.90
64.37
70.06
31.67
82.40
51.50
71.06
41.27
GLM 4.7
Z.AI
58.09
59.73
73.13
41.67
76.02
55.17
65.23
35.66
GLM 4.6
Z.AI
55.19
62.06
71.02
35.00
81.13
51.95
58.99
26.19
Qwen 3 235B A22B Thinking 2507
Alibaba
52.97
59.40
68.97
6.67
73.39
52.18
69.52
40.64
DeepSeek V3.2
DeepSeek
51.84
44.25
75.69
46.67
63.95
45.03
64.24
23.06
Qwen 3 Next 80B A3B Thinking
Alibaba
50.41
58.16
60.66
8.33
74.26
53.58
56.31
41.54
DeepSeek V3.2 Exp
DeepSeek
49.85
45.50
73.19
36.67
64.38
44.26
65.60
19.33
Qwen 3 235B A22B Instruct 2507
Alibaba
48.84
58.43
69.61
13.33
68.03
44.72
66.07
21.72
Qwen 3 Next 80B A3B Instruct
Alibaba
48.35
54.75
68.20
10.00
70.18
49.78
66.34
19.19
GPT OSS 120b
OpenAI
46.09
39.21
60.21
16.67
68.87
38.80
48.59
50.29
Qwen 3 32B
Alibaba
43.56
48.25
66.03
3.33
67.44
46.54
55.54
17.77
Devstral 2
Mistral
41.24
27.74
66.79
43.33
52.52
39.14
45.67
13.50
GLM 4.6V
Z.AI
40.07
37.22
64.24
3.33
62.50
46.41
49.74
17.06
Qwen 3 30B A3B
Alibaba
39.01
36.68
48.88
1.67
65.35
44.92
54.47
21.11
Trinity Large Preview
Arcee
32.74
20.61
65.65
3.33
44.93
40.33
42.15
12.19
来源:
https://livebench.ai/#/https://liveb.space.z.ai/

最新文章

随机文章