Files
ollama/x/imagegen/mlx/README.md
Daniel Hiltgen 33ee7168ba Add experimental MLX backend and engine with imagegen support (#13648)
* WIP - MLX backend with gemma3

* MLX: add cmake and go tag build toggles

To build the new MLX backend code:
  cmake --preset MLX
  cmake --build --preset MLX --parallel
  cmake --install build --component MLX
  go build -tags mlx .

Note: the main.go entrypoint for the MLX engine will change in a follow up commit.

* add experimental image generation runtime

* add experimental image generation runtime

* MLX: wire up cuda build for linux

* MLX: get dependencies correct and dedup

This is still too large for a unified github artifact, but is now "correct" for the mlx_cuda_v13
directory.

* fix relative link bug in dedup

* Add darwin build and readme

* add go build tag for mlx dependent code and wire up build_darwin.sh

* lint cleanup

* macos: build mlx for x86

This will be CPU only.

* cuda build instructions and fix drift from mlx bump

* stale comment

* Delete agent helper doc

* Clean up readme.md

* Revise README for tokenizer clarity and details

Updated README to clarify tokenizer functionality and removed correctness section.

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2026-01-08 16:18:59 -08:00

1.3 KiB

MLX Memory Management

| This package will get consolidated with x/ml/backend/mlx in the future.

Automatic Tracking

All arrays are automatically tracked when created. On Eval(), non-kept arrays are freed.

API

result := mlx.Matmul(x, w) // arrays automatically tracked
mlx.Eval(result)           // free non-kept, eval result (auto-kept)

Key Functions

  • mlx.Eval(outputs...) - free non-kept arrays, then evaluate (outputs auto-kept)
  • mlx.AsyncEval(outputs...) - async version of Eval (outputs auto-kept)
  • mlx.Keep(arrays...) - mark arrays to survive cleanup (for weights, caches)
  • array.Free() - mark array for cleanup on next Eval

Loop Pattern

for step := 0; step < maxTokens; step++ {
    logits := model.Forward(token, caches)
    oldToken := token
    token = sample(logits)

    // Keep cache state across iterations
    for _, c := range caches {
        mlx.Keep(c.State()...)
    }

    oldToken.Free()       // mark for cleanup
    mlx.AsyncEval(token)  // frees old, evals new
}

Notes

  • Eval() and AsyncEval() auto-keep their outputs
  • Free() marks for cleanup - actual free happens during next Eval
  • Use Keep() for weights and cache state that must survive multiple Eval cycles
  • Arrays created inside compiled closures are managed by MLX, not tracked