Thank you for your great work on KV cache merging, D2O's dynamic token merging method inspires me a lot.
Currently I'm running your source code.
As stated in the paper, I choose N:M=3:1, alpha=0.3 and beta=0.7 under 20% (rho=0.2) KV cache compression ratio.
I'm sure that the Python dependancies and environment are met with your source code, and I'm using nvidia RTX PRO6000 GPU, which can run LongBench properly.
However, I found the results are very different from the paper.
The results from source code may not show D2O's strength.
If there are something I misunderstood or may omit, could you help to solve this issue?
Thank you very much.
Results from paper show,
| H2O | D2O
NarrativeQA | 13.27 | 14.43
Qasper | 11.05 | 12.66
MF-en | 17.72 | 19.93
HotpotQA | 10.38 | 11.92
2WikiMQA | 11.23 | 12.79
Musique | 6.38 | 9.88
GovReport | 21.29 | 24.36
QMSum | 21.33 | 23.42
MultiNews | 3.38 | 3.95
TREC | 66.63 | 69.72
TriviaQA | 89.19 | 90.99
SAMSum | 41.12 | 42.36
Pcount | 5.52 | 6.61
Pre | 11.11 | 14.67
Lcc | 71.86 | 72.43
RB-P | 58.29 | 60
Results from source code show,
| H2O | D2O
NarrativeQA | 12.43 | 12.64
Qasper | 12.55 | 11.92
MF-en | 19.95 | 19.87
HotpotQA | 10.92 | 10.72
2WikiMQA | 12.2 | 11.95
Musique | 6.65 | 6.75
GovReport | 22.97 | 21.13
QMSum | 23.44 | 23.13
MultiNews | 3.56 | 1.93
TREC | 69 | 69.67
TriviaQA | 90.57 | 90.63
SAMSum | 41.96 | 42.05
Pcount | 5.18 | 5.9
Pre | 11.58 | 13.86
Lcc | 69.26 | 71.57
RB-P | 55.67 | 58.43
Thank you for your great work on KV cache merging, D2O's dynamic token merging method inspires me a lot.
Currently I'm running your source code.
As stated in the paper, I choose N:M=3:1, alpha=0.3 and beta=0.7 under 20% (rho=0.2) KV cache compression ratio.
I'm sure that the Python dependancies and environment are met with your source code, and I'm using nvidia RTX PRO6000 GPU, which can run LongBench properly.
However, I found the results are very different from the paper.
The results from source code may not show D2O's strength.
If there are something I misunderstood or may omit, could you help to solve this issue?
Thank you very much.
Results from paper show,
| H2O | D2O
NarrativeQA | 13.27 | 14.43
Qasper | 11.05 | 12.66
MF-en | 17.72 | 19.93
HotpotQA | 10.38 | 11.92
2WikiMQA | 11.23 | 12.79
Musique | 6.38 | 9.88
GovReport | 21.29 | 24.36
QMSum | 21.33 | 23.42
MultiNews | 3.38 | 3.95
TREC | 66.63 | 69.72
TriviaQA | 89.19 | 90.99
SAMSum | 41.12 | 42.36
Pcount | 5.52 | 6.61
Pre | 11.11 | 14.67
Lcc | 71.86 | 72.43
RB-P | 58.29 | 60
Results from source code show,
| H2O | D2O
NarrativeQA | 12.43 | 12.64
Qasper | 12.55 | 11.92
MF-en | 19.95 | 19.87
HotpotQA | 10.92 | 10.72
2WikiMQA | 12.2 | 11.95
Musique | 6.65 | 6.75
GovReport | 22.97 | 21.13
QMSum | 23.44 | 23.13
MultiNews | 3.56 | 1.93
TREC | 69 | 69.67
TriviaQA | 90.57 | 90.63
SAMSum | 41.96 | 42.05
Pcount | 5.18 | 5.9
Pre | 11.58 | 13.86
Lcc | 69.26 | 71.57
RB-P | 55.67 | 58.43