Adds an endpoint '/running' (#61)

* Adds an endpoint '/running' that returns either an empty JSON object if no model has been loaded so far, or the last model loaded (model key) and it's current state (state key). Possible state values are: stopped, starting, ready and stopping.

* Improves the `/running` endpoint by allowing multiple entries under the `running` key within the JSON response.
Refactors the `/running` method name (listRunningProcessesHandler).
Removes the unlisted filter implementation.

* Adds tests for:
- no model loaded
- one model loaded
- multiple models loaded

* Adds simple comments.

* Simplified code structure as per 250313 comments on PR #65.

---------

Co-authored-by: FGDumitru|B <xelotx@gmail.com>
This commit is contained in:
Florin-Gabriel Dumitru
2025-03-13 22:42:59 +02:00
committed by GitHub
parent 60355bf74a
commit 3ac94ad20e
3 changed files with 136 additions and 0 deletions

View File

@@ -22,6 +22,7 @@ Written in golang, it is very easy to install (single binary with no dependancie
- ✅ Remote log monitoring at `/log`
- ✅ Direct access to upstream HTTP server via `/upstream/:model_id` ([demo](https://github.com/mostlygeek/llama-swap/pull/31))
- ✅ Manually unload models via `/unload` endpoint ([#58](https://github.com/mostlygeek/llama-swap/issues/58))
- ✅ Check current monitoring state via `/running` endpoint ([#61](https://github.com/mostlygeek/llama-swap/issues/61))
- ✅ Automatic unloading of models after timeout by setting a `ttl`
- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
- ✅ Docker and Podman support