v0.1.15
Features
Fine-tuning support (Experimental)
One can serve OpenLLM models with any PEFT-compatible layers with --adapter-id
openllm start opt --model-id facebook/opt-6.7b --adapter-id aarnphm/opt-6-7b-quotes
It also supports adapter from custom path:
openllm start opt --model-id facebook/opt-6.7b --adapter-id /path/to/adapters
To use multiple adapters, use the following format:
openllm start opt --model-id facebook/opt-6.7b --adapter-id aarnphm/opt-6.7b-lora --adapter-id aarnphm/opt-6.7b-lora:french_lora
By default, the first adapter-id will be the default lora layer, but optionally users can change what Lora layer to use for inference via /v1/adapters
:
curl -X POST http://localhost:3000/v1/adapters --json '{"adapter_name": "vn_lora"}'
Note that for multiple adapter-name and adapter-id, it is recomended to update to use the default adapter before sending the inference, to avoid any performance degradation
To include this into the Bento, one can also provide a --adapter-id
into openllm build
:
openllm build opt --model-id facebook/opt-6.7b --adapter-id ...
I will start rolling out support and scripts for more models so stay tuned!
Better GPU support (experimental)
0.1.15 comes with better GPU supports, meaning it respect CUDA_VISIBLE_DEVICES
, allowing users to have full control on how they want to serve their models.
0.1.15 also brings experimental support for AMD GPU. ROCm does have support for CUDA_VISIBLE_DEVICES
, therefore OpenLLM will also respect this behaviour for ROCm platform.
Installation
pip install openllm==0.1.15
To upgrade from a previous version, use the following command:
pip install --upgrade openllm==0.1.15
Usage
All available models: python -m openllm.models
To start a LLM: python -m openllm start dolly-v2
Find more information about this release in the CHANGELOG.md
What's Changed
- chore: better gif quality by @aarnphm in #71
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #74
- feat: cascading resource strategies by @aarnphm in #72
Full Changelog: v0.1.14...v0.1.15