Add cmd_stop configuration to better support docker (#35)

Add `cmd_stop` to model configuration to run a command instead of sending a SIGTERM to shutdown a process before swapping.
2025-01-30 16:59:57 -08:00
parent 2833517eef
commit baeb0c4e7f
6 changed files with 125 additions and 22 deletions
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@
 # Introduction
 llama-swap is a light weight, transparent proxy server that provides automatic model swapping to llama.cpp's server.

-Written in golang, it is very easy to install (single binary with no dependancies) and configure (single yaml file). 
+Written in golang, it is very easy to install (single binary with no dependancies) and configure (single yaml file).

 Download a pre-built [release](https://github.com/mostlygeek/llama-swap/releases) or build it yourself from source with `make clean all`.

@@ -30,11 +30,13 @@ Any OpenAI compatible server would work. llama-swap was originally designed for
  - `v1/rerank`
  - `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
 - ✅ Multiple GPU support
+- ✅ Docker Support ([#40](https://github.com/mostlygeek/llama-swap/pull/40))
 - ✅ Run multiple models at once with `profiles`
 - ✅ Remote log monitoring at `/log`
 - ✅ Automatic unloading of models from GPUs after timeout
 - ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
 - ✅ Direct access to upstream HTTP server via `/upstream/:model_id` ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
+-

 ## config.yaml

@@ -89,6 +91,20 @@ models:
    cmd: llama-server --port 9999 -m Llama-3.2-1B-Instruct-Q4_K_M.gguf -ngl 0
    unlisted: true

+  # Docker Support (Experimental)
+  # see: https://github.com/mostlygeek/llama-swap/pull/40
+  "dockertest":
+    proxy: "http://127.0.0.1:9790"
+
+    # introduced to reliably stop containers
+    cmd_stop: docker stop -t 2 dockertest
+
+    cmd: >
+      docker run --name dockertest
+      --init --rm -p 9790:8080 -v /mnt/nvme/models:/models
+      ghcr.io/ggerganov/llama.cpp:server
+      --model '/models/Qwen2.5-Coder-0.5B-Instruct-Q4_K_M.gguf'
+
 # profiles make it easy to managing multi model (and gpu) configurations.
 #
 # Tips: