Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer
#4927
opened May 20, 2024 by
divakar-amd
Loading…
[Build/CI] Switching to ROCm v. 6.1 in Dockerfile.rocm
#4922
opened May 20, 2024 by
Alexei-V-Ivanov-AMD
Loading…
[Bugfix] Fix custom all reduce nvlink check on multi node
#4903
opened May 19, 2024 by
esmeetu
Loading…
[Core] Eliminate parallel worker per-step task scheduling overhead
#4894
opened May 18, 2024 by
njhill
Loading…
[Misc] Load FP8 kv-cache scaling factors from checkpoints
#4893
opened May 17, 2024 by
comaniac
Loading…
1 task done
[Bugfix] Still download from huggingface while set VLLM_USE_MODELSCOPE = true
#4856
opened May 16, 2024 by
liuzhenghua
Loading…
[Bugfix / Core] Prefix Caching Guards (merged with main)
#4846
opened May 16, 2024 by
zhuohan123
Loading…
[Core] Avoid one broadcast op when propagating metadata
#4844
opened May 16, 2024 by
njhill
Loading…
Add a new kernel for fusing the dequantization in fused-moe gemm
#4841
opened May 15, 2024 by
RezaYazdaniAminabadi
Loading…
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support)
#4837
opened May 15, 2024 by
afeldman-nm
Loading…
[Hardware][Intel] Add LoRA adapter support for CPU backend
x86 CPU
#4830
opened May 15, 2024 by
Isotr0py
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.