diff --git a/README.md b/README.md index 679caad..870a23e 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ In the most basic configuration llama-swap handles one model at a time. For more ## config.yaml -llama-swap's configuration is purposefully simple. +llama-swap's configuration is purposefully simple: ```yaml models: @@ -57,19 +57,19 @@ models: --port ${PORT} ``` -But also very powerful: +.. but also supports many advanced features: -- ⚡ `groups` to run multiple models at once -- ⚡ `macros` for reusable snippets -- ⚡ `ttl` to automatically unload models -- ⚡ `aliases` to use familiar model names (e.g., "gpt-4o-mini") -- ⚡ `env` variables to pass custom environment to inference servers -- ⚡ `useModelName` to override model names sent to upstream servers -- ⚡ `healthCheckTimeout` to control model startup wait times -- ⚡ `${PORT}` automatic port variables for dynamic port assignment -- ⚡ Docker/podman compatible +- `groups` to run multiple models at once +- `macros` for reusable snippets +- `ttl` to automatically unload models +- `aliases` to use familiar model names (e.g., "gpt-4o-mini") +- `env` variables to pass custom environment to inference servers +- `useModelName` to override model names sent to upstream servers +- `healthCheckTimeout` to control model startup wait times +- `${PORT}` automatic port variables for dynamic port assignment +- `cmdStop` for to gracefully stop Docker/Podman containers -Check the [wiki](https://github.com/mostlygeek/llama-swap/wiki/Configuration) full documentation. +Check the [configuration documentation](https://github.com/mostlygeek/llama-swap/wiki/Configuration) in the wiki for all options. ## Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap)) @@ -79,7 +79,6 @@ Docker is the quickest way to try out llama-swap: # use CPU inference $ docker run -it --rm -p 9292:8080 ghcr.io/mostlygeek/llama-swap:cpu - # qwen2.5 0.5B $ curl -s http://localhost:9292/v1/chat/completions \ -H "Content-Type: application/json" \ @@ -87,7 +86,6 @@ $ curl -s http://localhost:9292/v1/chat/completions \ -d '{"model":"qwen2.5","messages": [{"role": "user","content": "tell me a joke"}]}' | \ jq -r '.choices[0].message.content' - # SmolLM2 135M $ curl -s http://localhost:9292/v1/chat/completions \ -H "Content-Type: application/json" \ @@ -97,7 +95,7 @@ $ curl -s http://localhost:9292/v1/chat/completions \ ```
-Docker images are nightly ... +Docker images are built nightly ... They include: