Improve Activity event handling in the UI
- fixes#252 found that the Activity page showed activity inconsistent
with /api/metrics
- Change data structure for event metrics to array.
- Add Event stream connections status indicator
Add barebones but working implementation of model preload
* add config test for Preload hook
* improve TestProxyManager_StartupHooks
* docs for new hook configuration
* add a .dev to .gitignore
* Decouple MetricsMiddleware from downstream handlers
Remove ls-real-model-name optimization. Within proxyOAIHandler the
request body's bytes are required for various rewriting features
anyways. This negated any benefits from trying not to parse it twice.
* Add endpoint aliases for reranking models
* Add MetricsMiddleware to the previous reranking endpoint
* Fix the embeddings endpoint not having model set
Fix#198
- use llama-server's `timings` info if available in response body
- send "-1" for token/sec when not able to accurately calculate
performance
- optimize streaming body search for metrics information
A bug fix that ensures comments don't interfere with macro expansion by
removing them first. This prevents unwanted comment text from appearing
in the final expanded command.
Co-authored-by: Yathiraj Bollimbala G <yathi@yStudio.localdomain>
Major internal refactor to use an event bus to pass event/messages along. These changes are largely invisible user facing but sets up internal design for real time stats and information.
- `--watch-config` logic refactored for events
- remove multiple SSE api endpoints, replaced with /api/events
- keep all functionality essentially the same
- UI/backend sync is in near real time now
llama-swap can strip specific keys in JSON requests. This is useful for removing the ability for clients to set sampling parameters like temperature, top_k, top_p, etc.
Append custom env vars instead of replace in Process (#168, #169)
PR #162 refactored the default configuration code. This
introduced a subtle bug where `env` became `[]string{}` instead of the
default of `nil`.
In golang, `exec.Cmd.Env == nil` means to use the "current process's
environment". By setting it to `[]string{}` as a default the Process's
environment was emptied out which caused an array of strange and
difficult to troubleshoot behaviour. See issues #168 and #169
This commit changes the behaviour to append model configured environment
variables to the default list rather than replace them.
Refactor internal upstream process life cycle management to recover better from unexpected situations. With this change llama-swap should never need to be restarted due to a crashed upstream child process. The `StateFailed` state was removed in favour of always trying to start/restart a process.
Allow configuration of how a model is stopped before swapping. Setting `cmdStop` in the configuration will override the default behaviour and enables better integration with other process/container managers like docker or podman.
Sometimes upstreams can accept HTTP but never respond causing requests
to build up waiting for a response. This can block Process.Stop() as
that waits for inflight requests to finish. This change refactors the
code to not wait when attempting to shutdown the process.
Add automatic port numbers assignment in configuration file. The string `${PORT}` will be substituted in model.cmd and model.proxy for an actual port number. This also allows model.proxy to be omitted from the configuration.