Oleg Shulyakov
fc3bb716df
UI styling / code improvements ( #307 )
...
Clean up and improve UI styling
* fix: UI - dependency cleanup
* chore: UI - start script
* refactor: UI - Extract Header
* fix: UI - Header styling
* fix: UI - LogViewer styling
* fix: UI - Models styling
* fix: UI - Activity styling
* fix: UI - ConnectionStatus colors
* review: UI - table border colors
2025-09-19 10:47:17 -07:00
Benson Wong
c36986fef6
upstream handler support for model names with forward slash ( #298 )
...
The upstream handler would break on model IDs that contained a forward
slash. Model IDs like "aaa/bbb" called at upstream/aaa/bbb would result
in an error. This commit adds support for model IDs with a forward slash
by iteratively searching the path for a match.
Fixes : #229
2025-09-13 13:37:03 -07:00
Artur Podsiadły
558801db1a
Fix nginx proxy buffering for streaming endpoints ( #295 )
...
* Fix nginx proxy buffering for streaming endpoints
- Add X-Accel-Buffering: no header to SSE endpoints (/api/events, /logs/stream)
- Add X-Accel-Buffering: no header to proxied text/event-stream responses
- Add nginx reverse proxy configuration section to README
- Add tests for X-Accel-Buffering header on streaming endpoints
Fixes #236
* Fix goroutine cleanup in streaming endpoints test
Add context cancellation to TestProxyManager_StreamingEndpointsReturnNoBufferingHeader
to ensure the goroutine is properly cleaned up when the test completes.
2025-09-09 16:07:46 -07:00
Benson Wong
b21dee27c1
Fix #288 Vite hot module reloading creating multiple SSE connections ( #290 )
...
- move SSE (EventSource) connection to module level
- manage EventSource as a singleton, closing open connection before
reopening a new one
2025-09-07 21:48:58 -07:00
Benson Wong
f58c8c8ec5
Support llama.cpp's cache_n in timings info ( #287 )
...
Capture prompt cache metrics and surface them on Activities page in UI
2025-09-06 13:58:02 -07:00
Benson Wong
954e2dee73
Remove cmdStart from README [skip ci]
...
cmdStart was in the README but it doesn't exist. Fixed the typo. Oops.
2025-09-04 11:57:28 -07:00
Benson Wong
a533aec736
small tweak to example config
2025-09-01 21:26:58 -07:00
Brett Profitt
97b17fc47d
Add ${MODEL_ID} macro ( #226 )
...
The automatic ${MODEL_ID} macro includes the name of the model and can be used in Cmd and CmdStop.
2025-09-01 21:21:37 -07:00
Benson Wong
2457840698
Update README.md [skip ci]
2025-08-28 23:44:37 -07:00
Benson Wong
7f55494151
Update README.md [skip ci]
2025-08-28 22:47:28 -07:00
Benson Wong
831a90d3b0
Add different timeout scenarios to Process.checkHealthEndpoint #276 ( #278 )
...
- add a TCP connection timeout of 500ms
- increase HTTP client timeout to 5000ms
In this new behaviour the upstream has 500ms to accept a tcp connection
and 5000ms to respond to the HTTP request.
2025-08-28 22:03:14 -07:00
Yandrik
977f1856bb
add /completion endpoint ( #275 )
...
* feat: add /completion endpoint
* chore: reformat using gofmt
2025-08-28 21:41:02 -07:00
Benson Wong
52b329f7bc
Fix #277 race condition in ProcessGroup.ProxyRequest when swap=true
2025-08-28 21:38:40 -07:00
Benson Wong
57803fd3aa
Support llama-server's /infill endpoint ( #272 )
...
Add support for llama-server's /infill endpoint and metrics gathering on the Activities page.
2025-08-27 08:36:05 -07:00
Benson Wong
c55d0cc842
Add docs for model.concurrencyLimit #263 [skip ci]
2025-08-22 16:08:37 -07:00
Benson Wong
7acbaf4712
Add connection status indicator in UI ( #260 )
...
* show connection status as icon in UI title
* make connection status event driven
2025-08-20 13:58:24 -07:00
Benson Wong
fcc5ad135a
UI: Allow editing of title ( #246 )
...
- make <h1> title contentEditable
- title setting persists across reloads in localStorage
2025-08-17 09:42:06 -07:00
Benson Wong
305e5a0031
improve example config [skip ci]
2025-08-17 09:19:04 -07:00
Benson Wong
04fc67354a
Improve Activity event handling in the UI ( #254 )
...
Improve Activity event handling in the UI
- fixes #252 found that the Activity page showed activity inconsistent
with /api/metrics
- Change data structure for event metrics to array.
- Add Event stream connections status indicator
2025-08-15 21:44:08 -07:00
Benson Wong
4662cf7699
add 'unconfirmed bug' as default label in bug-report.md
2025-08-15 15:38:12 -07:00
Benson Wong
5dc6b3e6d9
Add barebones but working implementation of model preload ( #209 , #235 )
...
Add barebones but working implementation of model preload
* add config test for Preload hook
* improve TestProxyManager_StartupHooks
* docs for new hook configuration
* add a .dev to .gitignore
2025-08-14 10:27:28 -07:00
Benson Wong
74c69f39ef
Add prompt processing metrics ( #250 )
...
- capture prompt processing metrics
- display prompt processing metrics on UI Activity page
2025-08-14 10:02:16 -07:00
Benson Wong
a186318892
Update Readme, Add screenshot for Activities page [skip ci]
2025-08-08 13:39:46 -07:00
Benson Wong
c4e4d5e1e9
Update Readme UI Screenshot [skip ci]
2025-08-08 13:33:47 -07:00
Benson Wong
7985e94ba4
add tokens processed to ui models page
2025-08-08 13:28:39 -07:00
Benson Wong
74556c3a36
Update bug-report.md [skip ci]
2025-08-08 09:52:05 -07:00
Benson Wong
5c381e4b30
Add gofmt linting to ci
2025-08-07 20:29:18 -07:00
Benson Wong
10569ed546
Fix model alias usage in upstream path ( #230 )
...
Model alias values are not properly resolved and work in upstream/ path.
Related to #229 .
2025-08-07 20:16:56 -07:00
Benson Wong
5b10b3c23f
UI Tweaks ( #228 )
...
* sort model names in UI
* add toggle to show model id/name on UI model page
2025-08-07 11:07:03 -07:00
Benson Wong
45ea792a3a
Fix UI panel not saving position correctly
2025-08-06 14:02:22 -07:00
Benson Wong
1bc2802353
fix panels not saving sizing state
2025-08-06 14:00:21 -07:00
Benson Wong
701476c0c4
Update README.md - remove contributor block [skip ci]
...
Contributor information available on the Github page's sidebar. Redundant.
2025-08-06 11:11:47 -07:00
Ben Greene
5c63e0066c
return models sorted by id in /v1/models ( #222 )
2025-08-06 10:04:52 -07:00
Martin Garton
8be5073c51
Fix typo ( #223 ) [skip ci]
...
Fix typo `lama-swap` -> `llama-swap`
2025-08-06 10:02:38 -07:00
Aaron Ang
6307bd3205
Add support for building Linux ARM64 binary in Makefile ( #221 )
2025-08-05 16:26:06 -07:00
Benson Wong
558a72de17
UI Improvements ( #219 )
...
- use react-resizable-panels for UI
- improve icons for buttons
- improve mobile layout with drag/resize panels
2025-08-03 17:49:13 -07:00
Leoyzen
dc42cf366d
Add config monitor support for k8s configmap. ( #217 )
2025-08-03 08:05:48 -07:00
Ryein Goddard
ba0a81937a
Update README.md ( #216 )
...
Update git clone protocol to https
2025-08-01 19:48:09 -07:00
Benson Wong
574fdfabb4
UI improvements ( #213 )
...
* use two column for logs view on wider screens
* hide log controls when panel is minimized
2025-07-31 11:59:21 -07:00
Benson Wong
5172cb2e12
Update docs in Readme [skip ci]
2025-07-30 11:51:14 -07:00
Benson Wong
5672cb03fd
Update github actions for notifying homebrew build ( #212 )
...
Combine homebrew-llama-swap event with the release action
2025-07-30 11:29:03 -07:00
Benson Wong
0f583163f7
add /health ( #211 )
2025-07-30 10:37:10 -07:00
Benson Wong
7905fa9ea3
Update trigger-homebrew-update.yml [skip ci]
2025-07-30 10:13:49 -07:00
Ian Sebastian Mathew
bbaf172956
add trigger to rebuild homebrew formula ( #210 )
2025-07-30 10:12:21 -07:00
Benson Wong
fd50932dbc
Decouple MetricsMiddleware from downstream handlers ( #206 )
...
* Decouple MetricsMiddleware from downstream handlers
Remove ls-real-model-name optimization. Within proxyOAIHandler the
request body's bytes are required for various rewriting features
anyways. This negated any benefits from trying not to parse it twice.
2025-07-27 10:36:06 -07:00
Gaël James
8c693e7fcf
Add endpoint aliases for reranking models ( #201 )
...
* Add endpoint aliases for reranking models
* Add MetricsMiddleware to the previous reranking endpoint
* Fix the embeddings endpoint not having model set
2025-07-24 08:32:47 -07:00
Benson Wong
8f2af26a41
fix stats on model page
2025-07-23 13:57:33 -07:00
Benson Wong
01d4838fb3
Fix token metrics parsing ( #199 )
...
Fix #198
- use llama-server's `timings` info if available in response body
- send "-1" for token/sec when not able to accurately calculate
performance
- optimize streaming body search for metrics information
2025-07-22 23:10:14 -07:00
Benson Wong
accd65294b
add contributors to README [skip ci]
2025-07-21 23:16:48 -07:00
Benson Wong
7472a25864
Update README.md [skip ci]
...
update screenshot for web UI
2025-07-21 23:08:19 -07:00