10 Dec 19:02

davidkoski

6ef303b

2.21.2 Latest

Latest

What's Changed

add VLM support, refactor common LM code into MLXLMCommon. breaking API changes by @davidkoski in #151

based on models from https://github.com/Blaizzy/mlx-vlm
for #132

Xcode 16

Xcode 16 is required to build the example applications and tools. Older Xcode can still build the libraries via swiftpm (so no changes in requirements to any applications or libraries that refer to this).

This change is required because the xcodeproj now refers to the local Package.swift file to get builds consistent with external users. If needed we can switch back to using xcodeproj for library builds (internal) and swiftpm for library builds (external) -- if there is a problem please file an issue and it can be considered.

Additions

There are two new libraries:

MLXVLM contains vision language models that combine images and text prompts to produce text results, e.g. describe this image
MLXLMCommon contains the LanguageModel code that is shared between MLXLLM and MLXVLM

The API between LLM and VLM is identical aside from the preparation of the UserInput.

let parameters = GenerateParameters()

// LLM prompt
let input = UserInput(prompt: "tell me a story")

// VLM prompt
let input = UserInput(prompt: "describe the image", images: [.url(url)])

// inference is identical
let result = try await modelContainer.perform { [generate, input] context in
    let input = try await context.processor.prepare(input: input)
    return try generate(input: input, parameters: parameters, context: context) { token in
        // print tokens as they are generated, stop early, etc.
        return .more
    }
}

VLM example code is available in the llm-tool example:

./mlx-run llm-tool eval --help
OVERVIEW: evaluate prompt and images to generate text (VLM)

USAGE: llm-tool eval <options>

OPTIONS:
  --model <model>         Name of the huggingface model or absolute path to directory
  -p, --prompt <prompt>   The message to be processed by the model.  Use @path,@path to load from files, e.g. @/tmp/prompt.txt
  --resize <resize>       Resize images to this size (width, height)
  --image <image>         Paths or urls for input images
...

Breaking Changes

Probably no effect to code external to this repo:

the mlx-swift-examples.xcodeproj now references the local Package.swift to build the libraries
the example code now uses the naming matching external uses of mlx-swift-examples, e.g. import LLM -> import MLXLLM
the library directories are now renamed to match their target names, e.g. LLM -> MLXLLM

Breaking:

some code will now need to import both MLXLLM and MLXLMCommon (particularly code that loads models)
MLXLMCommon contains the common API between LLM and VLM

import MLXLLM
import MLXLMCommon

constants for models have moved from ModelConfiguration to ModelRegistry
this is MLXLM.ModelRegistry and there is also MLXVLM.ModelRegistry

-    let modelConfiguration = ModelConfiguration.phi3_5_4bit
+    let modelConfiguration = ModelRegistry.phi3_5_4bit

the loadModelContainer() function is now LLMModelFactory.shared.loadContainer()
there is a new VLMModelFactory with identical methods for loading VLMs

-     let modelContainer = try await LLM.loadModelContainer(configuration: modelConfiguration)
-    {
+     let modelContainer = try await LLMModelFactory.shared.loadContainer(
+          configuration: modelConfiguration
+    ) {

ModelContainer.perform is now throwing (and in MLXLMCommon):

-     let result = await modelContainer.perform { model, tokenizer in
-          LLM.generate(
+     let result = try await modelContainer.perform { model, tokenizer in
+          try MLXLMCommon.generate(

ModelConfiguration previously had a way to register new configurations. This is now on LLMModelFactory (and VLMModelFactory has the same):

LLMModelFactory.shared.modelRegistry.register(configurations: [modelConfiguration])

Deprecations

An example at the end shows all of these deprecations in context.

Prefer to use the ModelContext.processor to prepare prompts. Previously users would pass in a bare [Int] of tokens, but in order to support more complex inputs (VLMs) the use of bare [Int] is deprecated and callers should use UserInput and LMInput.

For example, previously callers might have done something like this:

let messages = [["role": "user", "content": prompt]]
let promptTokens = try await modelContainer.perform { _, tokenizer in
    try tokenizer.applyChatTemplate(messages: messages)
}

Now that should be:

let input = try await context.processor.prepare(input: .init(prompt: prompt))

Which will initialize a UserInput from the prompt text and produce an LMInput that can be used to generate tokens.

This call to generate() is now deprecated:

public func generate(
    promptTokens: [Int], parameters: GenerateParameters, model: any LanguageModel,
    tokenizer: Tokenizer,
    extraEOSTokens: Set<String>? = nil,
    didGenerate: ([Int]) -> GenerateDisposition
) throws -> GenerateResult

This consumed the [Int] variety of tokens. Now this is preferred:

public func generate(
    input: LMInput, parameters: GenerateParameters, context: ModelContext,
    didGenerate: ([Int]) -> GenerateDisposition
) throws -> GenerateResult

This method on ModelContainer is now deprecated:

    /// Perform an action on the model and/or tokenizer.  Callers _must_ eval any `MLXArray` before returning as
    /// `MLXArray` is not `Sendable`.
    @available(*, deprecated, message: "prefer perform(_:) that uses a ModelContext")
    public func perform<R>(_ action: @Sendable (any LanguageModel, Tokenizer) throws -> R) rethrows
        -> R

use this one instead (though the former still works):

    /// Perform an action on the ``ModelContext``.  Callers _must_ eval any `MLXArray` before returning as
    /// `MLXArray` is not `Sendable`.
    public func perform<R>(_ action: @Sendable (ModelContext) async throws -> R) async rethrows -> R

Example

Putting all of these deprecations together, previously you might have generated text like this:

            let messages = [["role": "user", "content": prompt]]
            let promptTokens = try await modelContainer.perform { _, tokenizer in
                try tokenizer.applyChatTemplate(messages: messages)
            }

            let result = await modelContainer.perform { model, tokenizer in
                LLM.generate(
                    promptTokens: promptTokens, parameters: generateParameters, model: model,
                    tokenizer: tokenizer, extraEOSTokens: modelConfiguration.extraEOSTokens
                ) { tokens in ... }
            }

now do this:

            let result = try await modelContainer.perform { context in
                let input = try await context.processor.prepare(input: .init(prompt: prompt))
                return try MLXLMCommon.generate(
                    input: input, parameters: generateParameters, context: context
                ) { tokens in ... }
            }

Full Changelog: 1.18.2...2.21.2

Contributors

davidkoski

Assets 2

10 Dec 18:59

davidkoski

1.18.2

318044f

1.18.2

Last tag before breaking API changes from #151 (vlm support)

What's Changed

Fix DynamicNTKScalingRoPE by @DePasqualeOrg in #154
Update Gemma and Gemma 2 to more closely follow Python implementations by @DePasqualeOrg in #156
fixes from python side of stable-diffusion by @davidkoski in #158
Add Embedders/Encoders support. by @anishbasu in #157
Update asData for StableDiffusion Image by @LiYanan2004 in #159

New Contributors

@anishbasu made their first contribution in #157
@LiYanan2004 made their first contribution in #159

Full Changelog: 1.18.1...1.18.2

Contributors

anishbasu, DePasqualeOrg, and 2 other contributors

Assets 2

13 Nov 22:30

davidkoski

1.18.1

7baf9bc

1.18.1

Release matching mlx-swift 0.18.1

What's Changed

Remove AsyncAlgorithms package by @DePasqualeOrg in #100
fix #102 -- extra eval of non-model/mlxarray data by @davidkoski in #103
Fix MNIST predictions by @rounak in #110
Add SuScaledRotaryEmbedding for Phi 3.5 by @DePasqualeOrg in #107
refactor llm model load code to build inside the actor by @davidkoski in #108
add kvcache, async eval, etc for #93 by @davidkoski in #109
fix for #114 -- incorrect shape used in preload by @davidkoski in #115
make sure config.json is downloaded by @davidkoski in #121
Support Dynamic ModelType by @johnmai-dev in #123
Support InternLM2 by @johnmai-dev in #124
update swift-transformers by @johnmai-dev in #118
pick up swiftpm change from #118 by @davidkoski in #122
implement stable diffusion example by @davidkoski in #120
Bump swift transformers by @awni in #129
fix #128 -- use a DISAMBIGUATOR to have a unique bundleId by @davidkoski in #131
fix #113 -- fall back to local files if offline by @davidkoski in #133
chore: update Models.swift by @eltociear in #134
Add LLama 3.2 1B and 3B lightweight models descriptions by @jsmp in #130
Fix parameter count for quantized models by @awni in #137
fix swift 6 warnings - thread safe tokenizer and model config by @davidkoski in #126
use updated url by @davidkoski in #142
Use chat template by @DePasqualeOrg in #135
Add Phi 3.5 MoE by @DePasqualeOrg in #116
Move models to subdirectory by @DePasqualeOrg in #148
fix: add StableDiffusion to package products by @nanguoyu in #149
update mlx-swift to 0.18.1 by @davidkoski in #147