From 88916059e14161434d5fe360de9d328f9a7617e1 Mon Sep 17 00:00:00 2001
From: Benson Wong <mostlygeek@gmail.com>
Date: Mon, 3 Mar 2025 10:44:16 -0800
Subject: [PATCH] add /unload to docs

---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 2e0cd0c..45a389d 100644
--- a/README.md
+++ b/README.md
@@ -18,13 +18,13 @@ Written in golang, it is very easy to install (single binary with no dependancie
   - `v1/embeddings`
   - `v1/rerank`
   - `v1/audio/speech` ([#36](https://github.com/mostlygeek/llama-swap/issues/36))
-- ✅ Multiple GPU support
-- ✅ Docker and Podman support
 - ✅ Run multiple models at once with `profiles` ([docs](https://github.com/mostlygeek/llama-swap/issues/53#issuecomment-2660761741))
 - ✅ Remote log monitoring at `/log`
-- ✅ Automatic unloading of models from GPUs after timeout
-- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
 - ✅ Direct access to upstream HTTP server via `/upstream/:model_id` ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
+- ✅ Manually unload models via `/unload` endpoint ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
+- ✅ Automatic unloading of models after timeout by setting a `ttl`
+- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
+- ✅ Docker and Podman support
 
 ## How does llama-swap work?