llama-swap

Author	SHA1	Message	Date
Benson Wong	1f6179110c	proxy/config: add model level macros (#330 ) * proxy/config: add model level macros Add macros to model configuration. Model macros override macros that are defined at the global configuration level. They follow the same naming and value rules as the global macros. * proxy/config: fix bug with macro reserved name checking The PORT reserved name was not properly checked * proxy/config: add tests around model.filters.stripParams - add check that model.filters.stripParams has no invalid macros - renamed strip_params to stripParams for camel case consistency - add legacy code compatibility so model.filters.strip_params continues to work * proxy/config: add duplicate removal to model.filters.stripParams * clean up some doc nits	2025-09-28 23:32:52 -07:00
Benson Wong	216c40b951	proxy/config: create config package and migrate configuration (#329 ) * proxy/config: create config package and migrate configuration The configuration is become more complex as llama-swap adds more advanced features. This commit moves config to its own package so it can be developed independently of the proxy package. Additionally, enforcing a public API for a configuration will allow downstream usage to be more decoupled.	2025-09-28 16:50:06 -07:00
Benson Wong	9e3d491c85	proxyToUpstream: add redirect with trailing slash to upstream endpoint (#322 ) This adds a redirect to the upstream endpoint so it always ends with a trailing /. Fixes #321	2025-09-25 16:43:00 -07:00
Benson Wong	1a84926505	proxy: add unload of single model (#318 ) This adds a new API endpoint, /api/models/unload/*model, that unloads a single model. In the UI when a model is in a ReadyState it will have a new button to unload it. Fixes #312	2025-09-24 20:53:48 -07:00
Benson Wong	c36986fef6	upstream handler support for model names with forward slash (#298 ) The upstream handler would break on model IDs that contained a forward slash. Model IDs like "aaa/bbb" called at upstream/aaa/bbb would result in an error. This commit adds support for model IDs with a forward slash by iteratively searching the path for a match. Fixes: #229	2025-09-13 13:37:03 -07:00
Artur Podsiadły	558801db1a	Fix nginx proxy buffering for streaming endpoints (#295 ) * Fix nginx proxy buffering for streaming endpoints - Add X-Accel-Buffering: no header to SSE endpoints (/api/events, /logs/stream) - Add X-Accel-Buffering: no header to proxied text/event-stream responses - Add nginx reverse proxy configuration section to README - Add tests for X-Accel-Buffering header on streaming endpoints Fixes #236 * Fix goroutine cleanup in streaming endpoints test Add context cancellation to TestProxyManager_StreamingEndpointsReturnNoBufferingHeader to ensure the goroutine is properly cleaned up when the test completes.	2025-09-09 16:07:46 -07:00
Benson Wong	f58c8c8ec5	Support llama.cpp's cache_n in timings info (#287 ) Capture prompt cache metrics and surface them on Activities page in UI	2025-09-06 13:58:02 -07:00
Brett Profitt	97b17fc47d	Add ${MODEL_ID} macro (#226 ) The automatic ${MODEL_ID} macro includes the name of the model and can be used in Cmd and CmdStop.	2025-09-01 21:21:37 -07:00
Benson Wong	831a90d3b0	Add different timeout scenarios to Process.checkHealthEndpoint #276 (#278 ) - add a TCP connection timeout of 500ms - increase HTTP client timeout to 5000ms In this new behaviour the upstream has 500ms to accept a tcp connection and 5000ms to respond to the HTTP request.	2025-08-28 22:03:14 -07:00
Yandrik	977f1856bb	add /completion endpoint (#275 ) * feat: add /completion endpoint * chore: reformat using gofmt	2025-08-28 21:41:02 -07:00
Benson Wong	52b329f7bc	Fix #277 race condition in ProcessGroup.ProxyRequest when swap=true	2025-08-28 21:38:40 -07:00
Benson Wong	57803fd3aa	Support llama-server's /infill endpoint (#272 ) Add support for llama-server's /infill endpoint and metrics gathering on the Activities page.	2025-08-27 08:36:05 -07:00
Benson Wong	04fc67354a	Improve Activity event handling in the UI (#254 ) Improve Activity event handling in the UI - fixes #252 found that the Activity page showed activity inconsistent with /api/metrics - Change data structure for event metrics to array. - Add Event stream connections status indicator	2025-08-15 21:44:08 -07:00
Benson Wong	5dc6b3e6d9	Add barebones but working implementation of model preload (#209 , #235 ) Add barebones but working implementation of model preload * add config test for Preload hook * improve TestProxyManager_StartupHooks * docs for new hook configuration * add a .dev to .gitignore	2025-08-14 10:27:28 -07:00
Benson Wong	74c69f39ef	Add prompt processing metrics (#250 ) - capture prompt processing metrics - display prompt processing metrics on UI Activity page	2025-08-14 10:02:16 -07:00
Benson Wong	10569ed546	Fix model alias usage in upstream path (#230 ) Model alias values are not properly resolved and work in upstream/ path. Related to #229.	2025-08-07 20:16:56 -07:00
Ben Greene	5c63e0066c	return models sorted by id in /v1/models (#222 )	2025-08-06 10:04:52 -07:00
Benson Wong	0f583163f7	add /health (#211 )	2025-07-30 10:37:10 -07:00
Benson Wong	fd50932dbc	Decouple MetricsMiddleware from downstream handlers (#206 ) * Decouple MetricsMiddleware from downstream handlers Remove ls-real-model-name optimization. Within proxyOAIHandler the request body's bytes are required for various rewriting features anyways. This negated any benefits from trying not to parse it twice.	2025-07-27 10:36:06 -07:00
Gaël James	8c693e7fcf	Add endpoint aliases for reranking models (#201 ) * Add endpoint aliases for reranking models * Add MetricsMiddleware to the previous reranking endpoint * Fix the embeddings endpoint not having model set	2025-07-24 08:32:47 -07:00
Benson Wong	01d4838fb3	Fix token metrics parsing (#199 ) Fix #198 - use llama-server's `timings` info if available in response body - send "-1" for token/sec when not able to accurately calculate performance - optimize streaming body search for metrics information	2025-07-22 23:10:14 -07:00
Benson Wong	cce0bc6aa1	add guard to ensure ls-real-model-name is set in context	2025-07-21 22:59:41 -07:00
g2mt	87dce5f8f6	Add metrics logging for chat completion requests (#195 ) - Add token and performance metrics for v1/chat/completions - Add Activity Page in UI - Add /api/metrics endpoint Contributed by @g2mt	2025-07-21 22:19:55 -07:00
Benson Wong	6299c1b874	Fix High CPU (#189 ) * vendor in kelindar/event lib and refactor to remove time.Ticker	2025-07-15 18:04:30 -07:00
Yathi	a906cd459b	Strip comments before macro expansion in config (#193 ) A bug fix that ensures comments don't interfere with macro expansion by removing them first. This prevents unwanted comment text from appearing in the final expanded command. Co-authored-by: Yathiraj Bollimbala G <yathi@yStudio.localdomain>	2025-07-15 10:14:16 -07:00
Benson Wong	78b2bc3dbc	add toggle to hide/show unlisted models (#187 )	2025-07-02 16:14:20 -07:00
Benson Wong	1921e570d7	Add Event Bus (#184 ) Major internal refactor to use an event bus to pass event/messages along. These changes are largely invisible user facing but sets up internal design for real time stats and information. - `--watch-config` logic refactored for events - remove multiple SSE api endpoints, replaced with /api/events - keep all functionality essentially the same - UI/backend sync is in near real time now	2025-07-01 22:17:35 -07:00
Benson Wong	c867a6c9a2	Add name and description to v1/models list (#179 ) * Add support for name and description in v1/models list * add configuration example for name and description	2025-06-30 23:02:44 -07:00
Benson Wong	4236cec03a	Add Filters to Model Configuration (#174 ) llama-swap can strip specific keys in JSON requests. This is useful for removing the ability for clients to set sampling parameters like temperature, top_k, top_p, etc.	2025-06-23 10:52:29 -07:00
Benson Wong	9e02c22ff8	stopCmd should use same environment as p.cmd.Env (#171 , #172 )	2025-06-18 11:36:59 -07:00
Benson Wong	49035e2e8e	Append custom env vars instead of replace in Process (#171 ) Append custom env vars instead of replace in Process (#168, #169) PR #162 refactored the default configuration code. This introduced a subtle bug where `env` became `[]string{}` instead of the default of `nil`. In golang, `exec.Cmd.Env == nil` means to use the "current process's environment". By setting it to `[]string{}` as a default the Process's environment was emptied out which caused an array of strange and difficult to troubleshoot behaviour. See issues #168 and #169 This commit changes the behaviour to append model configured environment variables to the default list rather than replace them.	2025-06-18 11:09:13 -07:00
Benson Wong	2ae48c713b	add debug output for start command	2025-06-18 07:43:23 -07:00
Benson Wong	9a3c656738	New UI (#157 , #164 ) - Add a react UI to replace the plain HTML one. - Serve as a foundation for better GUI interactions	2025-06-16 16:45:19 -07:00
Benson Wong	75015f82ea	fix bug caused by macro replacement order (#166 ) User defined macros should be applied before checking for ${PORT} constraint in model.cmd and model.proxy.	2025-06-16 15:32:09 -07:00
Benson Wong	4fa12a429c	Refactor all default config values into config.go (#162 ) - Move all default values into one place. - Update tests to be more cross platform	2025-06-15 12:32:00 -07:00
Benson Wong	2dc0ca0663	improve llama-swap upstream process recovery and restarts (#155 ) Refactor internal upstream process life cycle management to recover better from unexpected situations. With this change llama-swap should never need to be restarted due to a crashed upstream child process. The `StateFailed` state was removed in favour of always trying to start/restart a process.	2025-06-05 16:24:55 -07:00
Daniel Hofer	a84098d3b4	Add missing object type to /v1/models endpoint (#154 )	2025-06-02 09:25:45 -07:00
Benson Wong	1ac6499c08	Add macros to Configuration schema (#149 ) * Add macros to Configuration schema * update docs	2025-05-29 21:51:25 -07:00
Benson Wong	02ee29d881	increase default healthCheckTimeout to 120s	2025-05-26 09:57:53 -07:00
Benson Wong	02aee4e86d	remove noisy debug print message	2025-05-20 10:43:10 -07:00
Benson Wong	f45896d395	add guard to avoid unnecessary logic in Process.Shutdown	2025-05-20 10:43:09 -07:00
choyuansu	f7e46a359f	Add link to unload endpoint in upstream list (#140 ) * Add link to open /unload	2025-05-20 08:31:44 -07:00
Benson Wong	b83a5fa291	make Failed stated recoverable (#137 ) A process in the failed state can transition to stopped either by calling /unload or swapping to another model.	2025-05-16 19:54:44 -07:00
Benson Wong	a8b81f2799	Add stopCmd for custom stopping instructions (#136 ) Allow configuration of how a model is stopped before swapping. Setting `cmdStop` in the configuration will override the default behaviour and enables better integration with other process/container managers like docker or podman.	2025-05-16 13:48:42 -07:00
fakezeta	2d00120781	Update proxymanager.go (#135 )	2025-05-16 06:45:09 -07:00
Benson Wong	afc9aef058	Fix #133 SanitizeCommand removes comments (#134 )	2025-05-15 15:28:50 -07:00
Benson Wong	d7b390df74	Add GH Action for Testing on Windows (#132 ) * Add windows specific test changes * Change the command line parsing library - Possible breaking changes for windows users!	2025-05-14 21:51:53 -07:00
Benson Wong	e3a0b013c1	add content length test for #131	2025-05-14 19:50:01 -07:00
Fadenfire	f5763a94a0	Fix content length being incorrect when useModelName is used (#131 ) * Fix content length being incorrect when useModelName is used * Update c.Request.ContentLength as well	2025-05-14 19:37:54 -07:00
Benson Wong	2441b383d3	Make checking for process killed status more robust	2025-05-14 16:26:56 -07:00

1 2 3

141 Commits