* api: add Anthropic Messages API compatibility layer
Add middleware to support the Anthropic Messages API format at /v1/messages.
This enables tools like Claude Code to work with Ollama local and cloud models through the
Anthropic API interface.
* WIP - MLX backend with gemma3
* MLX: add cmake and go tag build toggles
To build the new MLX backend code:
cmake --preset MLX
cmake --build --preset MLX --parallel
cmake --install build --component MLX
go build -tags mlx .
Note: the main.go entrypoint for the MLX engine will change in a follow up commit.
* add experimental image generation runtime
* add experimental image generation runtime
* MLX: wire up cuda build for linux
* MLX: get dependencies correct and dedup
This is still too large for a unified github artifact, but is now "correct" for the mlx_cuda_v13
directory.
* fix relative link bug in dedup
* Add darwin build and readme
* add go build tag for mlx dependent code and wire up build_darwin.sh
* lint cleanup
* macos: build mlx for x86
This will be CPU only.
* cuda build instructions and fix drift from mlx bump
* stale comment
* Delete agent helper doc
* Clean up readme.md
* Revise README for tokenizer clarity and details
Updated README to clarify tokenizer functionality and removed correctness section.
---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
With the upcoming addition of MLX, the linux bundle will exceed the
maximum github artifact size of 2G. This change will bring the size
back down.
The install.sh changes support backwards compatibility for prior versions
thus should be safe to merge concurrently with this change.
In #13525, I accidentally broke templates' ability to automatically
render tool call function arguments as JSON.
We do need these to be proper maps because we need templates to be able
to call range, which can't be done on custom types.
* preserve tool definition and call JSON ordering
This is another iteration of
<https://github.com/ollama/ollama/pull/12518>, but this time we've
simplified things by relaxing the competing requirements of being
compatible AND order-preserving with templates (vs. renderers). We
maintain backwards compatibility at the cost of not guaranteeing order
for templates. We plan on moving more and more models to renderers,
which have been updated to use these new data types, and additionally
we could add an opt-in way of templates getting an order-preserved list
(e.g., via sibling template vars)
* orderedmap_test: remove testify
The normalize function now checks for NaN and Inf values in the
embedding vector before processing. This prevents JSON encoding
failures when models produce invalid floating-point values.
Fixes#13572
Signed-off-by: majiayu000 <1835304752@qq.com>
The tool calling example used "get_temperature" for tool_calls but
defined the tool as "get_weather". Also removed trailing commas that
made the JSON invalid.
Fixes#13031
On the llama engine, when we compute the memory layout, we reserve
a buffer to allow for some flexibility for incorrect estimates.
This is subtracted from GPU free memory and on GPUs with limited
memory, it may underflow.
Fixes#13494
* Revert "add support for NVIDIA Nemotron 3 Nano"
This reverts commit e7d2ae9d69.
* GGML update to 380b4c984
Remove MaskBatchPadding as GGML_KQ_MASK_PAD is no longer present (no
padding required)
* update to c45f89d55
* ec98e2002
solar pro needed more adjusting - needs verification
* review comments
Refactored the ConfigV2 and RootFS types from server/images.go to a new types/model/config.go file under the model package. Updated all references to use model.ConfigV2 and model.RootFS. This allows for use in other projects without worrying about compiling the c code in the llama package.
The ggml/src/CMakeLists.txt uses GGML_VERSION_MAJOR for the shared
library SOVERSION property, but these variables were not defined when
building from ollama's CMakeLists.txt.
This caused libggml-base.so to be named with a literal "SOVERSION"
suffix (libggml-base.so.SOVERSION) instead of the actual version
number (libggml-base.so.0).
The fix adds the required GGML_VERSION_* variables before including
the ggml subdirectory.
Fixes#13436
* flash attn: add auto mode for llama engine
If the user does not specify fa in the environment, use auto-mode.
* review comments
* ensure kv cache quantized types have FA explicitly enabled
additional review comments