llama-swap

Author	SHA1	Message	Date
Benson Wong	1a84926505	proxy: add unload of single model (#318 ) This adds a new API endpoint, /api/models/unload/*model, that unloads a single model. In the UI when a model is in a ReadyState it will have a new button to unload it. Fixes #312	2025-09-24 20:53:48 -07:00
Oleg Shulyakov	fc3bb716df	UI styling / code improvements (#307 ) Clean up and improve UI styling * fix: UI - dependency cleanup * chore: UI - start script * refactor: UI - Extract Header * fix: UI - Header styling * fix: UI - LogViewer styling * fix: UI - Models styling * fix: UI - Activity styling * fix: UI - ConnectionStatus colors * review: UI - table border colors	2025-09-19 10:47:17 -07:00
Benson Wong	c36986fef6	upstream handler support for model names with forward slash (#298 ) The upstream handler would break on model IDs that contained a forward slash. Model IDs like "aaa/bbb" called at upstream/aaa/bbb would result in an error. This commit adds support for model IDs with a forward slash by iteratively searching the path for a match. Fixes: #229	2025-09-13 13:37:03 -07:00
Artur Podsiadły	558801db1a	Fix nginx proxy buffering for streaming endpoints (#295 ) * Fix nginx proxy buffering for streaming endpoints - Add X-Accel-Buffering: no header to SSE endpoints (/api/events, /logs/stream) - Add X-Accel-Buffering: no header to proxied text/event-stream responses - Add nginx reverse proxy configuration section to README - Add tests for X-Accel-Buffering header on streaming endpoints Fixes #236 * Fix goroutine cleanup in streaming endpoints test Add context cancellation to TestProxyManager_StreamingEndpointsReturnNoBufferingHeader to ensure the goroutine is properly cleaned up when the test completes.	2025-09-09 16:07:46 -07:00
Benson Wong	b21dee27c1	Fix #288 Vite hot module reloading creating multiple SSE connections (#290 ) - move SSE (EventSource) connection to module level - manage EventSource as a singleton, closing open connection before reopening a new one	2025-09-07 21:48:58 -07:00
Benson Wong	f58c8c8ec5	Support llama.cpp's cache_n in timings info (#287 ) Capture prompt cache metrics and surface them on Activities page in UI	2025-09-06 13:58:02 -07:00
Benson Wong	954e2dee73	Remove `cmdStart` from README [skip ci] cmdStart was in the README but it doesn't exist. Fixed the typo. Oops.	2025-09-04 11:57:28 -07:00
Benson Wong	a533aec736	small tweak to example config	2025-09-01 21:26:58 -07:00
Brett Profitt	97b17fc47d	Add ${MODEL_ID} macro (#226 ) The automatic ${MODEL_ID} macro includes the name of the model and can be used in Cmd and CmdStop.	2025-09-01 21:21:37 -07:00
Benson Wong	2457840698	Update README.md [skip ci]	2025-08-28 23:44:37 -07:00
Benson Wong	7f55494151	Update README.md [skip ci]	2025-08-28 22:47:28 -07:00
Benson Wong	831a90d3b0	Add different timeout scenarios to Process.checkHealthEndpoint #276 (#278 ) - add a TCP connection timeout of 500ms - increase HTTP client timeout to 5000ms In this new behaviour the upstream has 500ms to accept a tcp connection and 5000ms to respond to the HTTP request.	2025-08-28 22:03:14 -07:00
Yandrik	977f1856bb	add /completion endpoint (#275 ) * feat: add /completion endpoint * chore: reformat using gofmt	2025-08-28 21:41:02 -07:00
Benson Wong	52b329f7bc	Fix #277 race condition in ProcessGroup.ProxyRequest when swap=true	2025-08-28 21:38:40 -07:00
Benson Wong	57803fd3aa	Support llama-server's /infill endpoint (#272 ) Add support for llama-server's /infill endpoint and metrics gathering on the Activities page.	2025-08-27 08:36:05 -07:00
Benson Wong	c55d0cc842	Add docs for model.concurrencyLimit #263 [skip ci]	2025-08-22 16:08:37 -07:00
Benson Wong	7acbaf4712	Add connection status indicator in UI (#260 ) * show connection status as icon in UI title * make connection status event driven	2025-08-20 13:58:24 -07:00
Benson Wong	fcc5ad135a	UI: Allow editing of title (#246 ) - make <h1> title contentEditable - title setting persists across reloads in localStorage	2025-08-17 09:42:06 -07:00
Benson Wong	305e5a0031	improve example config [skip ci]	2025-08-17 09:19:04 -07:00
Benson Wong	04fc67354a	Improve Activity event handling in the UI (#254 ) Improve Activity event handling in the UI - fixes #252 found that the Activity page showed activity inconsistent with /api/metrics - Change data structure for event metrics to array. - Add Event stream connections status indicator	2025-08-15 21:44:08 -07:00
Benson Wong	4662cf7699	add 'unconfirmed bug' as default label in bug-report.md	2025-08-15 15:38:12 -07:00
Benson Wong	5dc6b3e6d9	Add barebones but working implementation of model preload (#209 , #235 ) Add barebones but working implementation of model preload * add config test for Preload hook * improve TestProxyManager_StartupHooks * docs for new hook configuration * add a .dev to .gitignore	2025-08-14 10:27:28 -07:00
Benson Wong	74c69f39ef	Add prompt processing metrics (#250 ) - capture prompt processing metrics - display prompt processing metrics on UI Activity page	2025-08-14 10:02:16 -07:00
Benson Wong	a186318892	Update Readme, Add screenshot for Activities page [skip ci]	2025-08-08 13:39:46 -07:00
Benson Wong	c4e4d5e1e9	Update Readme UI Screenshot [skip ci]	2025-08-08 13:33:47 -07:00
Benson Wong	7985e94ba4	add tokens processed to ui models page	2025-08-08 13:28:39 -07:00
Benson Wong	74556c3a36	Update bug-report.md [skip ci]	2025-08-08 09:52:05 -07:00
Benson Wong	5c381e4b30	Add gofmt linting to ci	2025-08-07 20:29:18 -07:00
Benson Wong	10569ed546	Fix model alias usage in upstream path (#230 ) Model alias values are not properly resolved and work in upstream/ path. Related to #229.	2025-08-07 20:16:56 -07:00
Benson Wong	5b10b3c23f	UI Tweaks (#228 ) * sort model names in UI * add toggle to show model id/name on UI model page	2025-08-07 11:07:03 -07:00
Benson Wong	45ea792a3a	Fix UI panel not saving position correctly	2025-08-06 14:02:22 -07:00
Benson Wong	1bc2802353	fix panels not saving sizing state	2025-08-06 14:00:21 -07:00
Benson Wong	701476c0c4	Update README.md - remove contributor block [skip ci] Contributor information available on the Github page's sidebar. Redundant.	2025-08-06 11:11:47 -07:00
Ben Greene	5c63e0066c	return models sorted by id in /v1/models (#222 )	2025-08-06 10:04:52 -07:00
Martin Garton	8be5073c51	Fix typo (#223 ) [skip ci] Fix typo `lama-swap` -> `llama-swap`	2025-08-06 10:02:38 -07:00
Aaron Ang	6307bd3205	Add support for building Linux ARM64 binary in Makefile (#221 )	2025-08-05 16:26:06 -07:00
Benson Wong	558a72de17	UI Improvements (#219 ) - use react-resizable-panels for UI - improve icons for buttons - improve mobile layout with drag/resize panels	2025-08-03 17:49:13 -07:00
Leoyzen	dc42cf366d	Add config monitor support for k8s configmap. (#217 )	2025-08-03 08:05:48 -07:00
Ryein Goddard	ba0a81937a	Update README.md (#216 ) Update git clone protocol to https	2025-08-01 19:48:09 -07:00
Benson Wong	574fdfabb4	UI improvements (#213 ) * use two column for logs view on wider screens * hide log controls when panel is minimized	2025-07-31 11:59:21 -07:00
Benson Wong	5172cb2e12	Update docs in Readme [skip ci]	2025-07-30 11:51:14 -07:00
Benson Wong	5672cb03fd	Update github actions for notifying homebrew build (#212 ) Combine homebrew-llama-swap event with the release action	2025-07-30 11:29:03 -07:00
Benson Wong	0f583163f7	add /health (#211 )	2025-07-30 10:37:10 -07:00
Benson Wong	7905fa9ea3	Update trigger-homebrew-update.yml [skip ci]	2025-07-30 10:13:49 -07:00
Ian Sebastian Mathew	bbaf172956	add trigger to rebuild homebrew formula (#210 )	2025-07-30 10:12:21 -07:00
Benson Wong	fd50932dbc	Decouple MetricsMiddleware from downstream handlers (#206 ) * Decouple MetricsMiddleware from downstream handlers Remove ls-real-model-name optimization. Within proxyOAIHandler the request body's bytes are required for various rewriting features anyways. This negated any benefits from trying not to parse it twice.	2025-07-27 10:36:06 -07:00
Gaël James	8c693e7fcf	Add endpoint aliases for reranking models (#201 ) * Add endpoint aliases for reranking models * Add MetricsMiddleware to the previous reranking endpoint * Fix the embeddings endpoint not having model set	2025-07-24 08:32:47 -07:00
Benson Wong	8f2af26a41	fix stats on model page	2025-07-23 13:57:33 -07:00
Benson Wong	01d4838fb3	Fix token metrics parsing (#199 ) Fix #198 - use llama-server's `timings` info if available in response body - send "-1" for token/sec when not able to accurately calculate performance - optimize streaming body search for metrics information	2025-07-22 23:10:14 -07:00
Benson Wong	accd65294b	add contributors to README [skip ci]	2025-07-21 23:16:48 -07:00

1 2 3 4 5 ...

303 Commits