More results on LongBench

From the results, for vanilla multihead attention models, we recommend using KIVI-2, which can maintain the performance of the full-precision model while offering the best efficiency. For multiquery attention or group query attention, since the keys and values are already compressed, we recommend using KIVI-4, which can maintain the performance of the full-precision model in these cases.

Table 1: Performance of LongChat-7b-v1.5-32K

The results of LongChat-7b-v1.5-32K on 15 tasks from LongBench. The model has 32K context length. We use a 32 group size and 128 residual length for both KIVI-2 and KIVI-4. The baseline is of full precision.

Task	LongChat-7b-v1.5-32K	w./ KIVI-2	w./ KIVI-4
NarrativeQA	20.65	20.79	20.49
Qasper	29.42	28.69	28.90
MultiFieldQA	43.15	41.02	43.24
HotpotQA	33.05	32.91	33.07
MuSiQue	14.66	13.82	14.66
2WikiMultihopQA	24.14	23.00	24.86
GovReport	30.85	30.47	31.40
QMSum	22.84	22.59	22.84
MultiNews	26.55	26.28	26.52
LCC	54.83	54.11	54.06
RepoBench-P	58.94	57.62	58.77
TriviaQA	83.99	83.19	83.88
SAMSum	40.75	41.28	40.62
TRec	66.50	66.50	67.00
PassageRetrieval	30.50	32.25	31.50
Average	38.72	38.30	38.79

Table 2: Performance of Mistral-7B-Instruct-v0.2

The results of Mistral-7B-Instruct-v0.2 on 15 tasks from LongBench. The model has 32K context length and applies group query attention, which uses 8 heads for KV Cache, as opposed to the full 32 heads. We use a 32 group size and 128 residual length for both KIVI-2 and KIVI-4. The baseline is of full precision.

Task	Mistral-7B-Instruct-v0.2	w./ KIVI-2	w./ KIVI-4
NarrativeQA	21.02	20.61	20.97
Qasper	29.41	28.73	29.41
MultiFieldQA	47.13	44.88	46.52
HotpotQA	36.53	35.47	36.25
MuSiQue	19.13	17.95	19.53
2WikiMultihopQA	21.76	20.68	21.66
GovReport	32.59	32.55	32.97
QMSum	23.99	23.65	24.06
MultiNews	27.09	26.54	26.89
LCC	53.49	53.03	53.33
RepoBench-P	51.40	51.16	51.41
TriviaQA	86.23	86.00	86.23
SAMSum	43.04	43.34	43.34
TRec	71.00	71.00	71.00
PassageRetrieval	89.33	80.83	89.42
Average	43.54	42.43	43.53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

long_bench.md

long_bench.md

More results on LongBench

Table 1: Performance of LongChat-7b-v1.5-32K

Table 2: Performance of Mistral-7B-Instruct-v0.2

Files

long_bench.md

Latest commit

History

long_bench.md

File metadata and controls

More results on LongBench

Table 1: Performance of LongChat-7b-v1.5-32K

Table 2: Performance of Mistral-7B-Instruct-v0.2