From 88916059e14161434d5fe360de9d328f9a7617e1 Mon Sep 17 00:00:00 2001 From: Benson Wong Date: Mon, 3 Mar 2025 10:44:16 -0800 Subject: [PATCH] add /unload to docs --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 2e0cd0c..45a389d 100644 --- a/README.md +++ b/README.md @@ -18,13 +18,13 @@ Written in golang, it is very easy to install (single binary with no dependancie - `v1/embeddings` - `v1/rerank` - `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36)) -- ✅ Multiple GPU support -- ✅ Docker and Podman support - ✅ Run multiple models at once with `profiles` ([docs](https://github.com/mostlygeek/llama-swap/issues/53#issuecomment-2660761741)) - ✅ Remote log monitoring at `/log` -- ✅ Automatic unloading of models from GPUs after timeout -- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc) - ✅ Direct access to upstream HTTP server via `/upstream/:model_id` ([demo](https://github.com/mostlygeek/llama-swap/pull/31)) +- ✅ Manually unload models via `/unload` endpoint ([#58](https://github.com/mostlygeek/llama-swap/issues/58)) +- ✅ Automatic unloading of models after timeout by setting a `ttl` +- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc) +- ✅ Docker and Podman support ## How does llama-swap work?