mirror of https://github.com/ollama/ollama.git synced 2026-01-12 00:06:57 +08:00

Files

Daniel Hiltgen 33ee7168ba Add experimental MLX backend and engine with imagegen support (#13648 )

* WIP - MLX backend with gemma3

* MLX: add cmake and go tag build toggles

To build the new MLX backend code:
  cmake --preset MLX
  cmake --build --preset MLX --parallel
  cmake --install build --component MLX
  go build -tags mlx .

Note: the main.go entrypoint for the MLX engine will change in a follow up commit.

* add experimental image generation runtime

* add experimental image generation runtime

* MLX: wire up cuda build for linux

* MLX: get dependencies correct and dedup

This is still too large for a unified github artifact, but is now "correct" for the mlx_cuda_v13
directory.

* fix relative link bug in dedup

* Add darwin build and readme

* add go build tag for mlx dependent code and wire up build_darwin.sh

* lint cleanup

* macos: build mlx for x86

This will be CPU only.

* cuda build instructions and fix drift from mlx bump

* stale comment

* Delete agent helper doc

* Clean up readme.md

* Revise README for tokenizer clarity and details

Updated README to clarify tokenizer functionality and removed correctness section.

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>

2026-01-08 16:18:59 -08:00

1.9 KiB

Raw Blame History

imagegen

This is a package that uses MLX to run image generation models, ahead of being integrated into Ollama's primary runner. in CMakeLists.txt and rebuild.

1. Download a Model

Download Llama 3.1 8B (or any compatible model) in safetensors format:

mkdir -p ./weights

# Example using huggingface-cli
hf download meta-llama/Llama-3.1-8B --local-dir ./weights/Llama-3.1-8B
hf download openai/gpt-oss-20b --local-dir ./weights/gpt-oss-20b

2. Run Inference

# Build
go build ./cmd/engine

# Text generation
./engine -model ./weights/Llama-3.1-8B -prompt "Hello, world!" -max-tokens 250

# Qwen-Image 2512 (text-to-image)
./engine -qwen-image -model ./weights/Qwen-Image-2512 -prompt "A mountain landscape at sunset" \
  -width 1024 -height 1024 -steps 20 -seed 42 -output landscape.png

# Qwen-Image Edit (experimental) - 8 steps for speed, but model recommends 50
./engine -qwen-image-edit -model ./weights/Qwen-Image-Edit-2511 \
  -input-image input.png -prompt "Make it winter" -negative-prompt " " -cfg-scale 4.0 \
  -steps 8 -seed 42 -output edited.png

Memory Management

MLX Python/C++ uses scope-based memory management - arrays are freed when they go out of scope. Go's garbage collector is non-deterministic, so we can't rely on finalizers to free GPU memory promptly.

Instead, arrays are automatically tracked and freed on Eval():

// All arrays are automatically tracked when created
x := mlx.Add(a, b)
y := mlx.Matmul(x, w)

// Eval frees non-kept arrays, evaluates outputs (auto-kept)
mlx.Eval(y)

// After copying to CPU, free the array
data := y.Data()
y.Free()

Key points:

All created arrays are automatically tracked
mlx.Eval(outputs...) frees non-kept arrays, evaluates outputs (outputs auto-kept)
mlx.Keep(arrays...) marks arrays to survive multiple Eval cycles (for weights, caches)
Call .Free() when done with an array

1.9 KiB Raw Blame History

imagegen

1. Download a Model

2. Run Inference

Memory Management

1.9 KiB

Raw Blame History