Add barebones but working implementation of model preload * add config test for Preload hook * improve TestProxyManager_StartupHooks * docs for new hook configuration * add a .dev to .gitignore
This commit is contained in:
15
README.md
15
README.md
@@ -31,8 +31,9 @@ Written in golang, it is very easy to install (single binary with no dependencie
|
||||
- ✅ Run multiple models at once with `Groups` ([#107](https://github.com/mostlygeek/llama-swap/issues/107))
|
||||
- ✅ Automatic unloading of models after timeout by setting a `ttl`
|
||||
- ✅ Use any local OpenAI compatible server (llama.cpp, vllm, tabbyAPI, etc)
|
||||
- ✅ Docker and Podman support
|
||||
- ✅ Reliable Docker and Podman support with `cmdStart` and `cmdStop`
|
||||
- ✅ Full control over server settings per model
|
||||
- ✅ Preload models on startup with `hooks` ([#235](https://github.com/mostlygeek/llama-swap/pull/235))
|
||||
|
||||
## How does llama-swap work?
|
||||
|
||||
@@ -42,9 +43,9 @@ In the most basic configuration llama-swap handles one model at a time. For more
|
||||
|
||||
## config.yaml
|
||||
|
||||
llama-swap is managed entirely through a yaml configuration file.
|
||||
llama-swap is managed entirely through a yaml configuration file.
|
||||
|
||||
It can be very minimal to start:
|
||||
It can be very minimal to start:
|
||||
|
||||
```yaml
|
||||
models:
|
||||
@@ -55,7 +56,7 @@ models:
|
||||
--port ${PORT}
|
||||
```
|
||||
|
||||
However, there are many more capabilities that llama-swap supports:
|
||||
However, there are many more capabilities that llama-swap supports:
|
||||
|
||||
- `groups` to run multiple models at once
|
||||
- `ttl` to automatically unload models
|
||||
@@ -90,7 +91,7 @@ llama-swap can be installed in multiple ways
|
||||
|
||||
### Docker Install ([download images](https://github.com/mostlygeek/llama-swap/pkgs/container/llama-swap))
|
||||
|
||||
Docker images with llama-swap and llama-server are built nightly.
|
||||
Docker images with llama-swap and llama-server are built nightly.
|
||||
|
||||
```shell
|
||||
# use CPU inference comes with the example config above
|
||||
@@ -137,10 +138,10 @@ $ docker run -it --rm --runtime nvidia -p 9292:8080 \
|
||||
|
||||
### Homebrew Install (macOS/Linux)
|
||||
|
||||
The latest release of `llama-swap` can be installed via [Homebrew](https://brew.sh).
|
||||
The latest release of `llama-swap` can be installed via [Homebrew](https://brew.sh).
|
||||
|
||||
```shell
|
||||
# Set up tap and install formula
|
||||
# Set up tap and install formula
|
||||
brew tap mostlygeek/llama-swap
|
||||
brew install llama-swap
|
||||
# Run llama-swap
|
||||
|
||||
Reference in New Issue
Block a user