llama-swap

Author	SHA1	Message	Date
g2mt	87dce5f8f6	Add metrics logging for chat completion requests (#195 ) - Add token and performance metrics for v1/chat/completions - Add Activity Page in UI - Add /api/metrics endpoint Contributed by @g2mt	2025-07-21 22:19:55 -07:00
Yathi	a906cd459b	Strip comments before macro expansion in config (#193 ) A bug fix that ensures comments don't interfere with macro expansion by removing them first. This prevents unwanted comment text from appearing in the final expanded command. Co-authored-by: Yathiraj Bollimbala G <yathi@yStudio.localdomain>	2025-07-15 10:14:16 -07:00
Benson Wong	c867a6c9a2	Add name and description to v1/models list (#179 ) * Add support for name and description in v1/models list * add configuration example for name and description	2025-06-30 23:02:44 -07:00
Benson Wong	4236cec03a	Add Filters to Model Configuration (#174 ) llama-swap can strip specific keys in JSON requests. This is useful for removing the ability for clients to set sampling parameters like temperature, top_k, top_p, etc.	2025-06-23 10:52:29 -07:00
Benson Wong	75015f82ea	fix bug caused by macro replacement order (#166 ) User defined macros should be applied before checking for ${PORT} constraint in model.cmd and model.proxy.	2025-06-16 15:32:09 -07:00
Benson Wong	4fa12a429c	Refactor all default config values into config.go (#162 ) - Move all default values into one place. - Update tests to be more cross platform	2025-06-15 12:32:00 -07:00
Benson Wong	1ac6499c08	Add macros to Configuration schema (#149 ) * Add macros to Configuration schema * update docs	2025-05-29 21:51:25 -07:00
Benson Wong	02ee29d881	increase default healthCheckTimeout to 120s	2025-05-26 09:57:53 -07:00
Benson Wong	a8b81f2799	Add stopCmd for custom stopping instructions (#136 ) Allow configuration of how a model is stopped before swapping. Setting `cmdStop` in the configuration will override the default behaviour and enables better integration with other process/container managers like docker or podman.	2025-05-16 13:48:42 -07:00
Benson Wong	afc9aef058	Fix #133 SanitizeCommand removes comments (#134 )	2025-05-15 15:28:50 -07:00
Benson Wong	d7b390df74	Add GH Action for Testing on Windows (#132 ) * Add windows specific test changes * Change the command line parsing library - Possible breaking changes for windows users!	2025-05-14 21:51:53 -07:00
Benson Wong	9dc4bcb46c	Add a concurrency limit to Process.ProxyRequest (#123 )	2025-05-12 18:12:52 -07:00
Benson Wong	09e52c0500	Automatic Port Numbers (#105 ) Add automatic port numbers assignment in configuration file. The string `${PORT}` will be substituted in model.cmd and model.proxy for an actual port number. This also allows model.proxy to be omitted from the configuration.	2025-05-05 17:07:43 -07:00
Benson Wong	ca9063ffbe	ensure aliases are unique (#116 )	2025-05-05 15:34:18 -07:00
Benson Wong	448ccae959	Introduce Groups Feature (#107 ) Groups allows more control over swapping behaviour when a model is requested. The new groups feature provides three ways to control swapping: within the group, swapping out other groups or keep the models in the group loaded persistently (never swapped out). Closes #96, #99 and #106.	2025-05-02 22:35:38 -07:00
Benson Wong	b8f888f864	Logging Improvements (#88 ) This change revamps the internal logging architecture to be more flexible and descriptive. Previously all logs from both llama-swap and upstream services were mixed together. This makes it harder to troubleshoot and identify problems. This PR adds these new endpoints: - `/logs/stream/proxy` - just llama-swap's logs - `/logs/stream/upstream` - stdout output from the upstream server	2025-04-04 21:01:33 -07:00
Benson Wong	5c97299e7b	Add support for sending a custom model name to upstream (#69 ) (#71 ) * add test for splitRequestedModel() * Add `useModelName` parameter to model configuration * add docs to README	2025-03-14 21:07:52 -07:00
Benson Wong	314d2f2212	remove cmd_stop configuration and functionality from PR #40 (#44 ) * remove cmd_stop functionality from #40	2025-01-31 12:42:44 -08:00
Benson Wong	baeb0c4e7f	Add cmd_stop configuration to better support docker (#35 ) Add `cmd_stop` to model configuration to run a command instead of sending a SIGTERM to shutdown a process before swapping.	2025-01-30 16:59:57 -08:00
Benson Wong	84b667ca7a	improve logging and error reporting for troubleshooting	2024-12-20 10:46:56 -08:00
Benson Wong	891f6a5b5a	Add /upstream endpoint (#30 ) * remove catch-all route to upstream proxy (it was broken anyways) * add /upstream/:model_id to swap and route to upstream path * add /upstream HTML endpoint and unlisted option * add /upstream endpoint to show a list of available models * add `unlisted` configuration option to omit a model from /v1/models and /upstream lists * add favicon.ico	2024-12-17 14:37:44 -08:00
Benson Wong	9fc5d5b5eb	improve cmd parsing (#22 ) Switch from using a naive strings.Fields() to shlex.Split() for parsing the model startup command into a string[]. This makes parsing much more reliable around newlines, quotes, etc.	2024-12-01 09:02:58 -08:00
Benson Wong	73ad85ea69	Implement Multi-Process Handling (#7 ) Refactor code to support starting of multiple back end llama.cpp servers. This functionality is exposed as `profiles` to create a simple configuration format. Changes: * refactor proxy tests to get ready for multi-process support * update proxy/ProxyManager to support multiple processes (#7) * Add support for Groups in configuration * improve handling of Model alias configs * implement multi-model swapping * improve code clarity for swapModel * improve docs, rename groups to profiles in config	2024-11-23 19:45:13 -08:00
Benson Wong	533162ce6a	add support for automatically unloading a model (#10 ) (#14 ) * Make starting upstream process on-demand (#10) * Add automatic unload of model after TTL is reached * add `ttl` configuration parameter to models in seconds, default is 0 (never unload)	2024-11-19 16:32:51 -08:00
Benson Wong	36a31f450f	add proxy.Process to manage upstream proxy logic	2024-11-17 16:41:15 -08:00
Benson Wong	be82d1a6a0	Support multiline cmds in YAML configuration Add support for multiline `cmd` configurations allowing for nicer looking configuration YAML files.	2024-10-19 20:06:59 -07:00
Benson Wong	8eb5b7b6c4	Add custom check endpoint Replace previously hardcoded value for `/health` to check when the server became ready to serve traffic. With this the server can support any server that provides an an OpenAI compatible inference endpoint.	2024-10-11 21:59:21 -07:00
Benson Wong	d682589fb1	support environment variables	2024-10-04 11:55:27 -07:00
Benson Wong	bfdba43bd8	improve error handling	2024-10-04 10:55:02 -07:00
Benson Wong	d061819fb1	moved config into proxy package	2024-10-04 09:38:30 -07:00

30 Commits