Skip to content

v0.1.15

Compare
Choose a tag to compare
@aarnphm aarnphm released this 26 Jun 23:23
· 1467 commits to main since this release

Features

Fine-tuning support (Experimental)

One can serve OpenLLM models with any PEFT-compatible layers with --adapter-id

openllm start opt --model-id facebook/opt-6.7b --adapter-id aarnphm/opt-6-7b-quotes

It also supports adapter from custom path:

openllm start opt --model-id facebook/opt-6.7b --adapter-id /path/to/adapters

To use multiple adapters, use the following format:

openllm start opt --model-id facebook/opt-6.7b --adapter-id aarnphm/opt-6.7b-lora --adapter-id aarnphm/opt-6.7b-lora:french_lora

By default, the first adapter-id will be the default lora layer, but optionally users can change what Lora layer to use for inference via /v1/adapters:

curl -X POST http://localhost:3000/v1/adapters --json '{"adapter_name": "vn_lora"}'

Note that for multiple adapter-name and adapter-id, it is recomended to update to use the default adapter before sending the inference, to avoid any performance degradation

To include this into the Bento, one can also provide a --adapter-id into openllm build:

openllm build opt --model-id facebook/opt-6.7b --adapter-id ...

I will start rolling out support and scripts for more models so stay tuned!

Better GPU support (experimental)

0.1.15 comes with better GPU supports, meaning it respect CUDA_VISIBLE_DEVICES, allowing users to have full control on how they want to serve their models.

0.1.15 also brings experimental support for AMD GPU. ROCm does have support for CUDA_VISIBLE_DEVICES, therefore OpenLLM will also respect this behaviour for ROCm platform.

Installation

pip install openllm==0.1.15

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.1.15

Usage

All available models: python -m openllm.models

To start a LLM: python -m openllm start dolly-v2

Find more information about this release in the CHANGELOG.md

What's Changed

Full Changelog: v0.1.14...v0.1.15