diff --git a/README.md b/README.md index 2e0cd0c..45a389d 100644 --- a/README.md +++ b/README.md @@ -18,13 +18,13 @@ Written in golang, it is very easy to install (single binary with no dependancie - `v1/embeddings` - `v1/rerank` - `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36)) -- ✅ Multiple GPU support -- ✅ Docker and Podman support - ✅ Run multiple models at once with `profiles` ([docs](https://github.com/mostlygeek/llama-swap/issues/53#issuecomment-2660761741)) - ✅ Remote log monitoring at `/log` -- ✅ Automatic unloading of models from GPUs after timeout -- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc) - ✅ Direct access to upstream HTTP server via `/upstream/:model_id` ([demo](https://github.com/mostlygeek/llama-swap/pull/31)) +- ✅ Manually unload models via `/unload` endpoint ([#58](https://github.com/mostlygeek/llama-swap/issues/58)) +- ✅ Automatic unloading of models after timeout by setting a `ttl` +- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc) +- ✅ Docker and Podman support ## How does llama-swap work?