Prompt caching in `mlx_lm.server` #1026

awni · 2024-10-09T19:47:47Z

Added a basic prompt cache in mlx_lm.server for chat mode. But it does support chatting with cache reuse.

chimezie · 2024-10-12T01:43:29Z

This would not be back wards compatible with any later incorporation of batched input for generate (i.e., #948)

chimezie

Leave door for open for (#948)

chimezie · 2024-10-12T01:45:34Z

llms/mlx_lm/server.py

@@ -474,15 +531,15 @@ def handle_completion(

    def handle_stream(
        self,
-        prompt: mx.array,
+        prompt: List[int],


This in particular. Can't we handle a single or multiple (batched) prompt that falls back to the behavior for a single prompt by default?

We can always update this type to List[List[int]] when the time comes.

angeloskath

Looks fantastic! I left a few comments that may or may not need addressing.

llms/mlx_lm/server.py

awni requested a review from angeloskath October 9, 2024 19:47

chimezie suggested changes Oct 12, 2024

View reviewed changes

awni mentioned this pull request Oct 12, 2024

Enable caching for 'generate' and 'stream_generate' functions to ensure persistence of cache across multiple requests #989

Closed

angeloskath approved these changes Oct 14, 2024

View reviewed changes

llms/mlx_lm/server.py Outdated Show resolved Hide resolved

llms/mlx_lm/server.py Show resolved Hide resolved

llms/mlx_lm/server.py Show resolved Hide resolved

llms/mlx_lm/server.py Show resolved Hide resolved

awni added 5 commits October 14, 2024 10:48

caching in server

cdba586

nits

d85010b

fix tests

d6222ae

don't throw if no metal

1b05b51

comments

5c4e6ce

awni force-pushed the server_cache branch from f39ec54 to 5c4e6ce Compare October 14, 2024 17:48

awni requested review from angeloskath and barronalex October 14, 2024 17:57

awni merged commit 605c485 into main Oct 14, 2024
2 checks passed

awni deleted the server_cache branch October 14, 2024 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt caching in `mlx_lm.server` #1026

Prompt caching in `mlx_lm.server` #1026

awni commented Oct 9, 2024

chimezie commented Oct 12, 2024

chimezie left a comment

chimezie Oct 12, 2024

awni Oct 13, 2024

angeloskath left a comment

Prompt caching in mlx_lm.server #1026

Prompt caching in mlx_lm.server #1026

Conversation

awni commented Oct 9, 2024

chimezie commented Oct 12, 2024

chimezie left a comment

Choose a reason for hiding this comment

chimezie Oct 12, 2024

Choose a reason for hiding this comment

awni Oct 13, 2024

Choose a reason for hiding this comment

angeloskath left a comment

Choose a reason for hiding this comment

Prompt caching in `mlx_lm.server` #1026

Prompt caching in `mlx_lm.server` #1026