SuperCLUE-多轮对抗安全总排行榜
排名 |
模型 |
机构 |
总分 |
传统安全类 |
负责任类 |
指令攻击类 |
许可 |
1 |
GPT4 |
OpenAI |
87.43 |
84.51 |
91.22 |
86.7 |
闭源 |
2 |
vivoLM |
vivo |
85.17 |
84.39 |
92.88 |
77.99 |
闭源 |
3 |
讯飞星火(v4.0) |
科大讯飞 |
84.98 |
80.65 |
89.78 |
84.77 |
闭源 |
4 |
gpt-3.5-turbo |
OpenAI |
83.82 |
82.82 |
87.81 |
80.72 |
闭源 |
5 |
文心一言 |
百度 |
81.24 |
79.79 |
84.52 |
79.42 |
闭源 |
6 |
ChatGLM2-Pro |
清华&智谱AI |
79.82 |
77.16 |
87.22 |
74.98 |
闭源 |
7 |
ChatGLM2-6B |
清华&智谱AI |
79.43 |
76.53 |
84.36 |
77.45 |
开源可商用 |
8 |
Baichuan2-13B-Chat |
百川智能 |
78.78 |
74.7 |
85.87 |
75.86 |
开源可商用 |
9 |
Qwen-7B-Chat |
阿里巴巴 |
78.64 |
77.49 |
85.43 |
72.77 |
开源可商用 |
10 |
OpenBuddy-Llama2-70B |
OpenBuddy社区 |
78.21 |
77.37 |
87.51 |
69.3 |
开源可商用 |
11 |
Llama-2-13B-Chat |
Meta |
77.49 |
71.97 |
85.54 |
75.16 |
开源可商用 |
12 |
360GPT_S2_V94 |
360 |
76.52 |
71.45 |
85.09 |
73.12 |
闭源 |
13 |
Chinese-Alpaca-2-13B |
yiming cui |
75.39 |
73.21 |
82.44 |
70.39 |
开源可商用 |
14 |
MiniMax-Abab5.5 |
MiniMax |
71.9 |
71.67 |
79.77 |
63.82 |
闭源 |
SuperCLUE十大基础能力排行榜(2023年10月)
模型 |
计算 |
逻辑与推理 |
代码 |
知识与百科 |
语言理解与抽取 |
生成与创作 |
上下文对话 |
角色扮演 |
工具使用 |
传统安全 |
GPT4 |
95.56 |
100 |
85.89 |
98.14 |
100 |
68.68 |
75.68 |
79.68 |
88.75 |
88.27 |
Claude2 |
75.48 |
100 |
74.63 |
88.14 |
84.91 |
46.58 |
67.42 |
61.16 |
62.5 |
90.31 |
GPT3.5 |
74.04 |
95.1 |
69.25 |
79.56 |
87.61 |
55.65 |
59.26 |
66.57 |
56.88 |
87.24 |
vivoLM |
58.52 |
90.11 |
60.91 |
90.73 |
68.52 |
40.32 |
59.84 |
52.21 |
59.12 |
87.48 |
文心一言4.0 |
71.3 |
98.61 |
60.81 |
81.08 |
70.65 |
18.42 |
30.26 |
28.95 |
69.62 |
88.4 |
SenseChat 3.0 |
43.4 |
88.16 |
58.57 |
89.02 |
81.82 |
27.63 |
37.5 |
47.37 |
71.15 |
86.99 |
MiniMax-Abab5.5 |
34.26 |
63.51 |
47.37 |
82.43 |
54.35 |
21.05 |
26.32 |
28.95 |
50.63 |
72.45 |
OpenBuddy-70B |
31.48 |
89.19 |
47.37 |
50 |
47.83 |
9.21 |
28.95 |
15.79 |
56.33 |
75.26 |
Moonshot |
64.81 |
100 |
44.74 |
82.14 |
88.04 |
31.08 |
52.63 |
40.54 |
71.25 |
84.95 |
Qwen-14B-Chat |
52.78 |
52.86 |
44.74 |
65.38 |
46.74 |
14.47 |
14.86 |
11.84 |
50 |
77.3 |
讯飞星火V3.0 |
68.52 |
85.53 |
43.42 |
96.43 |
58.7 |
27.63 |
28.95 |
50 |
48.75 |
84.69 |
ChatGLM2-Pro |
64.81 |
90.54 |
36.84 |
76.83 |
65.22 |
25 |
48.68 |
39.47 |
51.95 |
85.97 |
Baichuan2-13B-Chat |
50.93 |
80.26 |
36.84 |
59.21 |
66.3 |
32.89 |
57.89 |
53.95 |
62.5 |
76.92 |
通义千问plus |
46.3 |
70 |
35.53 |
69.51 |
51.09 |
3.95 |
21.05 |
11.84 |
45.62 |
78.72 |
Chinese_Alpaca_2_13B |
24.07 |
52.7 |
35.53 |
47.3 |
67.39 |
18.42 |
40.79 |
36.49 |
21.25 |
75.51 |
Llama2-13B-Chat |
7.41 |
48.53 |
32.89 |
15.85 |
60.87 |
26.32 |
28.38 |
17.11 |
30.52 |
71.17 |
讯飞星火V2.0 |
51.85 |
55.41 |
31.58 |
79.27 |
50 |
28.95 |
28.95 |
25 |
43.75 |
84.69 |
云雀大模型(豆包) |
43.52 |
93.42 |
26.32 |
89.02 |
88.04 |
12.16 |
50 |
52.63 |
43.12 |
92.86 |
ChatGLM2-6B |
18.52 |
58.11 |
25 |
25 |
52.17 |
6.58 |
7.89 |
10.53 |
10.62 |
80.36 |
360GPT_S2_V9 |
13.89 |
64.86 |
16.22 |
34.62 |
25 |
2.63 |
21.05 |
9.21 |
17.31 |
79.59 |
SuperCLUE语言理解与生成排行榜(2023年10月)
排名 |
模型 |
总分 |
语言理解与抽取 |
生成与创作 |
上下文对话 |
角色扮演 |
1 |
GPT4 |
81.01 |
100 |
68.68 |
75.68 |
79.68 |
2 |
GPT3.5 |
67.27 |
87.61 |
55.65 |
59.26 |
66.57 |
3 |
Claude2 |
65.02 |
84.91 |
46.58 |
67.42 |
61.16️ |
4 |
vivoLM |
55.22 |
68.52 |
40.32 |
59.84 |
52.21 |
5 |
Moonshot |
53.07 |
88.04 |
31.08 |
52.63 |
40.54 |
6 |
Baichuan2-13B-Chat |
52.76 |
66.3 |
32.89 |
57.89 |
53.95 |
7 |
云雀大模型(豆包) |
50.71 |
88.04 |
12.16 |
50 |
52.63 |
8 |
SenseChat 3.0 |
48.58 |
81.82 |
27.63 |
37.5 |
47.37 |
9 |
ChatGLM2-Pro |
44.59 |
65.22 |
25 |
48.68 |
39.47 |
10 |
讯飞星火V3.0 |
41.32 |
58.7 |
27.63 |
28.95 |
50 |
11 |
Llama2-13B-Chat |
33.17 |
60.87 |
26.32 |
28.38 |
17.11 |
12 |
MiniMax-Abab5.5 |
32.67 |
54.35 |
21.05 |
26.32 |
28.95 |
13 |
OpenBuddy-70B |
25.44 |
47.83 |
9.21 |
28.95 |
15.79 |
14 |
通义千问plus |
21.98 |
51.09 |
3.95 |
21.05 |
11.84 |
15 |
Qwen-14B-Chat |
21.98 |
46.74 |
14.47 |
14.86 |
11.84 |
16 |
ChatGLM2-6B |
19.29 |
52.17 |
6.58 |
7.89 |
10.53 |
17 |
360GPT_S2_V9 |
14.47 |
25 |
2.63 |
21.05 |
9.21 |
SuperCLUE-Open十大能力表
模型 |
胜和率 |
语言理解 |
闲聊 |
上下文对话 |
角色扮演 |
知识百科 |
生成创作 |
代码 |
逻辑推理 |
计算 |
安全 |
GPT-4 |
94.64 |
80.00 |
97.30 |
93.18 |
100.00 |
87.76 |
100.00 |
97.92 |
100.00 |
100.00 |
95.12 |
Claude-instant-v1 |
69.51 |
64.29 |
92.31 |
68.52 |
83.02 |
51.79 |
51.06 |
54.00 |
59.57 |
80.00 |
86.79 |
MinMax-abab5 |
57.94 |
55.36 |
78.00 |
59.62 |
85.42 |
57.41 |
69.23 |
37.25 |
34.78 |
32.20 |
77.55 |
文心一言(v2.0.4) |
50.48 |
32.76 |
56.86 |
47.06 |
52.73 |
37.50 |
62.50 |
53.19 |
70.59 |
60.34 |
36.54 |
讯飞星火(v1.5) |
48.87 |
45.61 |
25.49 |
60.00 |
83.67 |
29.63 |
71.79 |
37.74 |
39.58 |
57.14 |
50.00 |
ChatGLM-130B |
42.46 |
44.64 |
53.06 |
50.00 |
51.92 |
39.29 |
52.50 |
17.07 |
37.25 |
42.37 |
34.00 |
ChatGLM2-6B-Chat |
36.50 |
33.33 |
38.33 |
36.67 |
41.67 |
20.00 |
40.00 |
21.67 |
55.00 |
45.00 |
33.33 |
Qwen-7B-Chat |
25.75 |
30.00 |
16.67 |
23.33 |
16.67 |
10.00 |
20.00 |
40.00 |
58.62 |
36.67 |
6.67 |
360智脑(4.0) |
23.93 |
25.42 |
16.95 |
23.64 |
14.04 |
10.17 |
41.67 |
32.08 |
43.40 |
30.00 |
7.02 |
jiangziya-13B-v1.1 |
22.04 |
13.33 |
8.47 |
24.56 |
16.07 |
24.14 |
19.61 |
25.49 |
28..00 |
38.98 |
22.81 |
MOSS-16B |
21.14 |
26.67 |
20.00 |
11.67 |
27.59 |
11.86 |
25.42 |
15.00 |
35.00 |
21.67 |
16.67 |
BELLE-13B |
15.61 |
25.00 |
8.47 |
15.25 |
6.90 |
11.67 |
9.80 |
33.33 |
32.08 |
13.56 |
3.33 |
DLM |
12.54 |
16.67 |
0.00 |
13.79 |
10.00 |
6.90 |
3.57 |
11.11 |
45.83 |
20.00 |
3.33 |
RWKV-world-7B |
12.45 |
10.64 |
8.47 |
12.96 |
7.27 |
11.86 |
10.20 |
25.00 |
18.00 |
12.28 |
8.93 |
baichuan-7B(预训练模型) |
3.11 |
1.89 |
0.00 |
0.00 |
0.00 |
1.72 |
1.69 |
3.33 |
18.33 |
3.33 |
0.00 |