Dynasurge's performance evaluations utilize the LLaMA model family.
- JackFram/Llama68m for generating draft tokens
- LLaMA2-7B for verification and autoregressive generation task
We conduct evaluations using two significant datasets to ensure the statistical significance and broad applicability of our results:
- C4 (en) validation dataset
- CNN DailyMail dataset
Each dataset involves testing 200 examples, with prompt and generation lengths set to 128 tokens to standardize the evaluation of different decoding strategies.
The experiments were performed on AWS g5.4xlarge instances equipped with:
- GPU: NVIDIA A10
- CPU: 16 cores
- RAM: 64 GiB
You can simply run bash eval.sh
after set up the environmemnt.
Create and activate a new environment with the required dependencies:
conda create -n dynasurge python=3.11
conda activate dynasurge
pip install -r requirements.txt
Execute the project in various modes depending on your requirements.
CUDA_VISIBLE_DEVICES=0 python main.py --draft JackFram/llama-68m --target huggyllama/llama-7b --T 0.9 --P 1.0 --start 0 --end 200 --M 384 --mode auto --dataset cnn
In static tree mode, the model utilizes a predefined tree structure. Specify the growmap file using the --growmap
option to define the tree's structure.
CUDA_VISIBLE_DEVICES=0 python main.py --draft JackFram/llama-68m --target huggyllama/llama-7b --T 0.9 --P 1.0 --start 0 --end 200 --M 384 --mode sTree --growmap ./growmaps/4x8-tree.pt --dataset cnn
Dynamic tree mode allows for flexible tree structures adjusted by a draft token budget. Use the --B
option to set the maximum budget for dynamic adjustments during processing.
CUDA_VISIBLE_DEVICES=0 python main.py --draft JackFram/llama-68m --target huggyllama/llama-7b --T 0.9 --P 1.0 --start 0 --end 200 --M 384 --growmap ./growmaps/8x8-tree.pt --mode dTree --B 64 --dataset cnn
The Enhanced Tree Verification (ETV) is an algorithm implemented in this project to improve tree-style speculative decoding. You can enable ETV by using the --use_bfs
flag. For example:
# static tree mode with ETV
CUDA_VISIBLE_DEVICES=0 python main.py --draft JackFram/llama-68m --target huggyllama/llama-7b --T 0.9 --P 1.0 --start 0 --end 200 --M 384 --mode sTree --growmap ./growmaps/4x8-tree.pt --dataset cnn --use_bfs
# dynamic tree mode with ETV
CUDA_VISIBLE_DEVICES=0 python main.py --draft JackFram/llama-68m --target huggyllama/llama-7b --T 0.9 --P 1.0 --start 0 --end 200 --M 384 --growmap ./growmaps/8x8-tree.pt --mode dTree --B 64 --dataset cnn --use_bfs
This project is developed as a demonstration for a course assignment and is currently in its prototype stage. Future developments and improvements are planned as part of ongoing research and educational efforts.