llama-swap

Author	SHA1	Message	Date
Yandrik	977f1856bb	add /completion endpoint (#275 ) * feat: add /completion endpoint * chore: reformat using gofmt	2025-08-28 21:41:02 -07:00
Benson Wong	52b329f7bc	Fix #277 race condition in ProcessGroup.ProxyRequest when swap=true	2025-08-28 21:38:40 -07:00
Benson Wong	57803fd3aa	Support llama-server's /infill endpoint (#272 ) Add support for llama-server's /infill endpoint and metrics gathering on the Activities page.	2025-08-27 08:36:05 -07:00
Benson Wong	04fc67354a	Improve Activity event handling in the UI (#254 ) Improve Activity event handling in the UI - fixes #252 found that the Activity page showed activity inconsistent with /api/metrics - Change data structure for event metrics to array. - Add Event stream connections status indicator	2025-08-15 21:44:08 -07:00
Benson Wong	5dc6b3e6d9	Add barebones but working implementation of model preload (#209 , #235 ) Add barebones but working implementation of model preload * add config test for Preload hook * improve TestProxyManager_StartupHooks * docs for new hook configuration * add a .dev to .gitignore	2025-08-14 10:27:28 -07:00
Benson Wong	74c69f39ef	Add prompt processing metrics (#250 ) - capture prompt processing metrics - display prompt processing metrics on UI Activity page	2025-08-14 10:02:16 -07:00
Benson Wong	10569ed546	Fix model alias usage in upstream path (#230 ) Model alias values are not properly resolved and work in upstream/ path. Related to #229.	2025-08-07 20:16:56 -07:00
Ben Greene	5c63e0066c	return models sorted by id in /v1/models (#222 )	2025-08-06 10:04:52 -07:00
Benson Wong	0f583163f7	add /health (#211 )	2025-07-30 10:37:10 -07:00
Benson Wong	fd50932dbc	Decouple MetricsMiddleware from downstream handlers (#206 ) * Decouple MetricsMiddleware from downstream handlers Remove ls-real-model-name optimization. Within proxyOAIHandler the request body's bytes are required for various rewriting features anyways. This negated any benefits from trying not to parse it twice.	2025-07-27 10:36:06 -07:00
Gaël James	8c693e7fcf	Add endpoint aliases for reranking models (#201 ) * Add endpoint aliases for reranking models * Add MetricsMiddleware to the previous reranking endpoint * Fix the embeddings endpoint not having model set	2025-07-24 08:32:47 -07:00
Benson Wong	01d4838fb3	Fix token metrics parsing (#199 ) Fix #198 - use llama-server's `timings` info if available in response body - send "-1" for token/sec when not able to accurately calculate performance - optimize streaming body search for metrics information	2025-07-22 23:10:14 -07:00
Benson Wong	cce0bc6aa1	add guard to ensure ls-real-model-name is set in context	2025-07-21 22:59:41 -07:00
g2mt	87dce5f8f6	Add metrics logging for chat completion requests (#195 ) - Add token and performance metrics for v1/chat/completions - Add Activity Page in UI - Add /api/metrics endpoint Contributed by @g2mt	2025-07-21 22:19:55 -07:00
Benson Wong	6299c1b874	Fix High CPU (#189 ) * vendor in kelindar/event lib and refactor to remove time.Ticker	2025-07-15 18:04:30 -07:00
Yathi	a906cd459b	Strip comments before macro expansion in config (#193 ) A bug fix that ensures comments don't interfere with macro expansion by removing them first. This prevents unwanted comment text from appearing in the final expanded command. Co-authored-by: Yathiraj Bollimbala G <yathi@yStudio.localdomain>	2025-07-15 10:14:16 -07:00
Benson Wong	78b2bc3dbc	add toggle to hide/show unlisted models (#187 )	2025-07-02 16:14:20 -07:00
Benson Wong	1921e570d7	Add Event Bus (#184 ) Major internal refactor to use an event bus to pass event/messages along. These changes are largely invisible user facing but sets up internal design for real time stats and information. - `--watch-config` logic refactored for events - remove multiple SSE api endpoints, replaced with /api/events - keep all functionality essentially the same - UI/backend sync is in near real time now	2025-07-01 22:17:35 -07:00
Benson Wong	c867a6c9a2	Add name and description to v1/models list (#179 ) * Add support for name and description in v1/models list * add configuration example for name and description	2025-06-30 23:02:44 -07:00
Benson Wong	4236cec03a	Add Filters to Model Configuration (#174 ) llama-swap can strip specific keys in JSON requests. This is useful for removing the ability for clients to set sampling parameters like temperature, top_k, top_p, etc.	2025-06-23 10:52:29 -07:00
Benson Wong	9e02c22ff8	stopCmd should use same environment as p.cmd.Env (#171 , #172 )	2025-06-18 11:36:59 -07:00
Benson Wong	49035e2e8e	Append custom env vars instead of replace in Process (#171 ) Append custom env vars instead of replace in Process (#168, #169) PR #162 refactored the default configuration code. This introduced a subtle bug where `env` became `[]string{}` instead of the default of `nil`. In golang, `exec.Cmd.Env == nil` means to use the "current process's environment". By setting it to `[]string{}` as a default the Process's environment was emptied out which caused an array of strange and difficult to troubleshoot behaviour. See issues #168 and #169 This commit changes the behaviour to append model configured environment variables to the default list rather than replace them.	2025-06-18 11:09:13 -07:00
Benson Wong	2ae48c713b	add debug output for start command	2025-06-18 07:43:23 -07:00
Benson Wong	9a3c656738	New UI (#157 , #164 ) - Add a react UI to replace the plain HTML one. - Serve as a foundation for better GUI interactions	2025-06-16 16:45:19 -07:00
Benson Wong	75015f82ea	fix bug caused by macro replacement order (#166 ) User defined macros should be applied before checking for ${PORT} constraint in model.cmd and model.proxy.	2025-06-16 15:32:09 -07:00
Benson Wong	4fa12a429c	Refactor all default config values into config.go (#162 ) - Move all default values into one place. - Update tests to be more cross platform	2025-06-15 12:32:00 -07:00
Benson Wong	2dc0ca0663	improve llama-swap upstream process recovery and restarts (#155 ) Refactor internal upstream process life cycle management to recover better from unexpected situations. With this change llama-swap should never need to be restarted due to a crashed upstream child process. The `StateFailed` state was removed in favour of always trying to start/restart a process.	2025-06-05 16:24:55 -07:00
Daniel Hofer	a84098d3b4	Add missing object type to /v1/models endpoint (#154 )	2025-06-02 09:25:45 -07:00
Benson Wong	1ac6499c08	Add macros to Configuration schema (#149 ) * Add macros to Configuration schema * update docs	2025-05-29 21:51:25 -07:00
Benson Wong	02ee29d881	increase default healthCheckTimeout to 120s	2025-05-26 09:57:53 -07:00
Benson Wong	02aee4e86d	remove noisy debug print message	2025-05-20 10:43:10 -07:00
Benson Wong	f45896d395	add guard to avoid unnecessary logic in Process.Shutdown	2025-05-20 10:43:09 -07:00
choyuansu	f7e46a359f	Add link to unload endpoint in upstream list (#140 ) * Add link to open /unload	2025-05-20 08:31:44 -07:00
Benson Wong	b83a5fa291	make Failed stated recoverable (#137 ) A process in the failed state can transition to stopped either by calling /unload or swapping to another model.	2025-05-16 19:54:44 -07:00
Benson Wong	a8b81f2799	Add stopCmd for custom stopping instructions (#136 ) Allow configuration of how a model is stopped before swapping. Setting `cmdStop` in the configuration will override the default behaviour and enables better integration with other process/container managers like docker or podman.	2025-05-16 13:48:42 -07:00
fakezeta	2d00120781	Update proxymanager.go (#135 )	2025-05-16 06:45:09 -07:00
Benson Wong	afc9aef058	Fix #133 SanitizeCommand removes comments (#134 )	2025-05-15 15:28:50 -07:00
Benson Wong	d7b390df74	Add GH Action for Testing on Windows (#132 ) * Add windows specific test changes * Change the command line parsing library - Possible breaking changes for windows users!	2025-05-14 21:51:53 -07:00
Benson Wong	e3a0b013c1	add content length test for #131	2025-05-14 19:50:01 -07:00
Fadenfire	f5763a94a0	Fix content length being incorrect when useModelName is used (#131 ) * Fix content length being incorrect when useModelName is used * Update c.Request.ContentLength as well	2025-05-14 19:37:54 -07:00
Benson Wong	2441b383d3	Make checking for process killed status more robust	2025-05-14 16:26:56 -07:00
Benson Wong	25f251699c	Prevent StateFailed after SIGKILL (#129 ) Closes #125	2025-05-14 10:47:35 -07:00
Benson Wong	7f37bcc6eb	Improve testing around using SIGKILL (#127 ) * Add test for SIGKILL of process * silent TestProxyManager_RunningEndpoint debug output * Ref #125	2025-05-13 21:21:52 -07:00
Benson Wong	519c3a4d22	Change /unload to not wait for inflight requests (#125 ) Sometimes upstreams can accept HTTP but never respond causing requests to build up waiting for a response. This can block Process.Stop() as that waits for inflight requests to finish. This change refactors the code to not wait when attempting to shutdown the process.	2025-05-13 11:39:19 -07:00
Benson Wong	9dc4bcb46c	Add a concurrency limit to Process.ProxyRequest (#123 )	2025-05-12 18:12:52 -07:00
Sam	bc652709a5	Add config hot-reload (#106 ) introduce --watch-config command line option to reload ProxyManager when configuration changes.	2025-05-11 17:37:00 -07:00
Benson Wong	09e52c0500	Automatic Port Numbers (#105 ) Add automatic port numbers assignment in configuration file. The string `${PORT}` will be substituted in model.cmd and model.proxy for an actual port number. This also allows model.proxy to be omitted from the configuration.	2025-05-05 17:07:43 -07:00
Benson Wong	ca9063ffbe	ensure aliases are unique (#116 )	2025-05-05 15:34:18 -07:00
Benson Wong	21d7973d11	Improve content-length handling (#115 ) ref: See #114 * Improve content-length handling - Content length was not always being sent - Add tests for content-length	2025-05-05 10:46:26 -07:00
Yi Hong Ang	cc450e9c5f	fix issue where proxy is still proxying with chunked transfer-encoding (#114 )	2025-05-05 10:00:03 -07:00

1 2 3

132 Commits