mirror of
https://github.com/ollama/ollama.git
synced 2026-01-12 00:06:57 +08:00
On the llama engine, when we compute the memory layout, we reserve a buffer to allow for some flexibility for incorrect estimates. This is subtracted from GPU free memory and on GPUs with limited memory, it may underflow. Fixes #13494