-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Core] Make disaggregated prefill compatible with pipeline parallelism
#12301
opened Jan 22, 2025 by
YuhanLiu11
Loading…
[Build] update requirements of no-device
ci/build
#12299
opened Jan 22, 2025 by
MengqingCao
Loading…
[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels
#12294
opened Jan 22, 2025 by
fenghuizhang
Loading…
[Benchmark] More accurate TPOT calc in ONLY add when PR is ready to merge/full CI is needed
benchmark_serving.py
ready
#12288
opened Jan 22, 2025 by
njhill
Loading…
[Frontend][V1] Online serving performance improvements
frontend
#12287
opened Jan 21, 2025 by
njhill
Loading…
[Core] Prefill Only Tokens Without KV Cache in Batch Requests (Disagg Prefill)
#12285
opened Jan 21, 2025 by
Shaoting-Feng
Loading…
[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD
#12282
opened Jan 21, 2025 by
rasmith
Loading…
[CI/Build] Add label automation for structured-output / speculative-decoding
ci/build
#12280
opened Jan 21, 2025 by
russellb
Loading…
NVIDIA Blackwell codegen
ci/build
documentation
Improvements or additions to documentation
#12271
opened Jan 21, 2025 by
johnnynunez
Loading…
[core] separate builder init and builder prepare for each batch
ready
ONLY add when PR is ready to merge/full CI is needed
#12253
opened Jan 21, 2025 by
youkaichao
Loading…
[Model] Enable Inference Support for the New Baichuan-M1 Model
documentation
Improvements or additions to documentation
new model
Requests to new models
#12251
opened Jan 21, 2025 by
rainkert
Loading…
[torch.compile] decouple compile sizes and cudagraph sizes
#12243
opened Jan 21, 2025 by
youkaichao
Loading…
[Frontend] Set server's maximum number of generated tokens using generation_config.json
frontend
#12242
opened Jan 21, 2025 by
mhendrey
Loading…
[Docs] Update FP8 KV Cache documentation
documentation
Improvements or additions to documentation
#12238
opened Jan 21, 2025 by
mgoin
Loading…
[Misc] Move find_loaded_library to platform_aware_utils.py
#12231
opened Jan 20, 2025 by
houseroad
Loading…
[V1][Spec Decode] Ngram Spec Decode
#12193
opened Jan 19, 2025 by
LiuXiaoxuanPKU
•
Draft
2 of 7 tasks
[Bugfix] fix race condition that leads to wrong order of token returned
#12192
opened Jan 19, 2025 by
joennlae
Loading…
[Kernel] add triton fused moe kernel for gptq/awq
#12185
opened Jan 18, 2025 by
jinzhen-lin
Loading…
[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor
#12167
opened Jan 17, 2025 by
kzawora-intel
Loading…
[Quantization/Parameter] WIP: Another Implementation of the Quantization Parameter Subclass Substitution
#12158
opened Jan 17, 2025 by
cennn
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.