* Decouple MetricsMiddleware from downstream handlers
Remove ls-real-model-name optimization. Within proxyOAIHandler the
request body's bytes are required for various rewriting features
anyways. This negated any benefits from trying not to parse it twice.
* Add endpoint aliases for reranking models
* Add MetricsMiddleware to the previous reranking endpoint
* Fix the embeddings endpoint not having model set
Fix#198
- use llama-server's `timings` info if available in response body
- send "-1" for token/sec when not able to accurately calculate
performance
- optimize streaming body search for metrics information
- use new metrics data instead of log parsing
- auto-start events connection to server, improves responsiveness
- remove unnecessary libraries and code
A bug fix that ensures comments don't interfere with macro expansion by
removing them first. This prevents unwanted comment text from appearing
in the final expanded command.
Co-authored-by: Yathiraj Bollimbala G <yathi@yStudio.localdomain>
Major internal refactor to use an event bus to pass event/messages along. These changes are largely invisible user facing but sets up internal design for real time stats and information.
- `--watch-config` logic refactored for events
- remove multiple SSE api endpoints, replaced with /api/events
- keep all functionality essentially the same
- UI/backend sync is in near real time now
llama-swap can strip specific keys in JSON requests. This is useful for removing the ability for clients to set sampling parameters like temperature, top_k, top_p, etc.
Append custom env vars instead of replace in Process (#168, #169)
PR #162 refactored the default configuration code. This
introduced a subtle bug where `env` became `[]string{}` instead of the
default of `nil`.
In golang, `exec.Cmd.Env == nil` means to use the "current process's
environment". By setting it to `[]string{}` as a default the Process's
environment was emptied out which caused an array of strange and
difficult to troubleshoot behaviour. See issues #168 and #169
This commit changes the behaviour to append model configured environment
variables to the default list rather than replace them.
Refactor internal upstream process life cycle management to recover better from unexpected situations. With this change llama-swap should never need to be restarted due to a crashed upstream child process. The `StateFailed` state was removed in favour of always trying to start/restart a process.
* Use `python3` instead of `curl` and `jq`
* Use quote to word splitting
* Remove undefined `local` in POSIX sh
* Added `LLAMA_SWAP_DEFAULT_ADDRESS` to customize the server address
* Added `mktemp` to `NEEDS`