vllm-project / vllm Public

Notifications
Fork 2.7k
Star 19.7k

Code
Issues 832
Pull requests 233
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

New pull request New

233 Open 1,704 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Frontend] Add prompt token ids into logit processor

#4932 opened May 21, 2024 by xz-liu

Loading…

[Misc] Minor change to improve FP8 thoughtput by 2%

#4931 opened May 21, 2024 by elfiegg

Loading…

[Model] add rope_scaling support for qwen2

#4930 opened May 21, 2024 by hzhwcmhf

Loading…

[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer

#4927 opened May 20, 2024 by divakar-amd

Loading…

[Docs] Add acknowledgment for sponsors

#4925 opened May 20, 2024 by simon-mo

Loading…

[Build/CI] Switching to ROCm v. 6.1 in Dockerfile.rocm

#4922 opened May 20, 2024 by Alexei-V-Ivanov-AMD

Loading…

[Kernel] Enhance MoE benchmarking & tuning script

#4921 opened May 20, 2024 by WoosukKwon • Draft

[CI/Build] Make marlin kernel build conditional.

#4905 opened May 19, 2024 by esmeetu

Loading…

[Bugfix] Fix custom all reduce nvlink check on multi node

#4903 opened May 19, 2024 by esmeetu

Loading…

Update test_ignore_eos

#4898 opened May 18, 2024 by simon-mo

Loading…

[Core] Eliminate parallel worker per-step task scheduling overhead

#4894 opened May 18, 2024 by njhill

Loading…

[Misc] Load FP8 kv-cache scaling factors from checkpoints

#4893 opened May 17, 2024 by comaniac

Loading…

1 task done

[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support)

#4888 opened May 17, 2024 by afeldman-nm • Draft

[Model] Add Phi-2 LoRA support

#4886 opened May 17, 2024 by Isotr0py

Loading…

[Bugfix] Fix with verifying model max len

#4885 opened May 17, 2024 by dimaioksha

Loading…

[Build/CI] Extending AMD Tests

#4875 opened May 17, 2024 by Alexei-V-Ivanov-AMD

Loading…

[Draft][CI/Build] Optimize models tests

#4874 opened May 17, 2024 by DarkLight1337 • Draft

[CI/Build] Add health check

#4868 opened May 16, 2024 by pseudotensor

Loading…

Add control panel allow manage multi vllm instances

#4861 opened May 16, 2024 by leiwen83

Loading…

[Bugfix] Still download from huggingface while set VLLM_USE_MODELSCOPE = true

#4856 opened May 16, 2024 by liuzhenghua

Loading…

[Bugfix / Core] Prefix Caching Guards (merged with main)

#4846 opened May 16, 2024 by zhuohan123

Loading…

[Core] Avoid one broadcast op when propagating metadata

#4844 opened May 16, 2024 by njhill

Loading…

Add a new kernel for fusing the dequantization in fused-moe gemm

#4841 opened May 15, 2024 by RezaYazdaniAminabadi

Loading…

[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support)

#4837 opened May 15, 2024 by afeldman-nm

Loading…

[Hardware][Intel] Add LoRA adapter support for CPU backend

x86 CPU

#4830 opened May 15, 2024 by Isotr0py

Loading…

Previous 1 2 3 4 5 … 9 10 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly