Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[V1][Frontend] Coalesce bunched RequestOutputs
#12298 opened Jan 22, 2025 by njhill Loading…
[Benchmark] More accurate TPOT calc in benchmark_serving.py ready ONLY add when PR is ready to merge/full CI is needed
#12288 opened Jan 22, 2025 by njhill Loading…
[Core] tokens in queue metric
#12286 opened Jan 21, 2025 by annapendleton Loading…
[Core] Support reset_prefix_cache frontend
#12284 opened Jan 21, 2025 by comaniac Loading…
NVIDIA Blackwell codegen ci/build documentation Improvements or additions to documentation
#12271 opened Jan 21, 2025 by johnnynunez Loading…
[core] separate builder init and builder prepare for each batch ready ONLY add when PR is ready to merge/full CI is needed
#12253 opened Jan 21, 2025 by youkaichao Loading…
[Model] Enable Inference Support for the New Baichuan-M1 Model documentation Improvements or additions to documentation new model Requests to new models
#12251 opened Jan 21, 2025 by rainkert Loading…
[Docs] Update FP8 KV Cache documentation documentation Improvements or additions to documentation
#12238 opened Jan 21, 2025 by mgoin Loading…
[V1][Spec Decode] Ngram Spec Decode
#12193 opened Jan 19, 2025 by LiuXiaoxuanPKU Draft
2 of 7 tasks
[Misc] Add Gemma2 GGUF support
#12186 opened Jan 18, 2025 by Isotr0py Draft
[Kernel] add triton fused moe kernel for gptq/awq
#12185 opened Jan 18, 2025 by jinzhen-lin Loading…
ProTip! Mix and match filters to narrow down what you’re looking for.