mirror of https://github.com/ollama/ollama.git synced 2026-01-12 00:06:57 +08:00

Files

Daniel Hiltgen 33ee7168ba Add experimental MLX backend and engine with imagegen support (#13648 )

* WIP - MLX backend with gemma3

* MLX: add cmake and go tag build toggles

To build the new MLX backend code:
  cmake --preset MLX
  cmake --build --preset MLX --parallel
  cmake --install build --component MLX
  go build -tags mlx .

Note: the main.go entrypoint for the MLX engine will change in a follow up commit.

* add experimental image generation runtime

* add experimental image generation runtime

* MLX: wire up cuda build for linux

* MLX: get dependencies correct and dedup

This is still too large for a unified github artifact, but is now "correct" for the mlx_cuda_v13
directory.

* fix relative link bug in dedup

* Add darwin build and readme

* add go build tag for mlx dependent code and wire up build_darwin.sh

* lint cleanup

* macos: build mlx for x86

This will be CPU only.

* cuda build instructions and fix drift from mlx bump

* stale comment

* Delete agent helper doc

* Clean up readme.md

* Revise README for tokenizer clarity and details

Updated README to clarify tokenizer functionality and removed correctness section.

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>

2026-01-08 16:18:59 -08:00

1.3 KiB

Raw Blame History

MLX Memory Management

| This package will get consolidated with x/ml/backend/mlx in the future.

Automatic Tracking

All arrays are automatically tracked when created. On Eval(), non-kept arrays are freed.

API

result := mlx.Matmul(x, w) // arrays automatically tracked
mlx.Eval(result)           // free non-kept, eval result (auto-kept)

Key Functions

mlx.Eval(outputs...) - free non-kept arrays, then evaluate (outputs auto-kept)
mlx.AsyncEval(outputs...) - async version of Eval (outputs auto-kept)
mlx.Keep(arrays...) - mark arrays to survive cleanup (for weights, caches)
array.Free() - mark array for cleanup on next Eval

Loop Pattern

for step := 0; step < maxTokens; step++ {
    logits := model.Forward(token, caches)
    oldToken := token
    token = sample(logits)

    // Keep cache state across iterations
    for _, c := range caches {
        mlx.Keep(c.State()...)
    }

    oldToken.Free()       // mark for cleanup
    mlx.AsyncEval(token)  // frees old, evals new
}

Notes

Eval() and AsyncEval() auto-keep their outputs
Free() marks for cleanup - actual free happens during next Eval
Use Keep() for weights and cache state that must survive multiple Eval cycles
Arrays created inside compiled closures are managed by MLX, not tracked

1.3 KiB Raw Blame History