- Add /api/version endpoint to ProxyManager that returns build date, commit hash, and version
- Implement SetVersion method to configure version info in ProxyManager
- Add version info fetching to APIProvider and display in ConnectionStatus component
- Include version info in UI context and update dependencies
- Add tests for version endpoint functionality
introduces a new configuration option logTimeFormat that allows customizing the timestamp in log messages using golang's built in time format constants. The default remains no timestamp.
Refactor the container build script to resolve llama.cpp base image for CPU, also tag these builds accordingly.
- For CPU containers, now fetch the latest 'server' tagged llama.cpp image instead of using a generic 'server' tag
- Cleans up the docker build command to use dynamic BASE_TAG variable
- Maintains existing push functionality for built images
When requesting / wol-proxy will show a loading page that polls /status
every second. When the upstream server is ready the loading page will
refresh causing the actual root page to be displayed
- Replace `addgroup` with `groupadd` for system group creation
- Replace `adduser` with `useradd` for system user creation
- Maintain same functionality while using more standard POSIX commands
Set default container user/group to lower privilege app user
* refactor: update Containerfile to support non-root user execution and improve security
- Updated LS_VER argument from 89 to 170 to use the latest version
- Added UID/GID arguments with default values of 0 (root) for backward compatibility
- Added USER_HOME environment variable set to /root
- Implemented conditional user/group creation logic that only runs when UID/GID are not 0
- Created necessary directory structure with proper ownership using mkdir and chown commands
- Switched to non-root user execution for improved security posture
- Updated COPY instruction to use --chown flag for proper file ownership
* chore: update containerfile to use non-root user with proper UID/GID
- Changed default UID and GID from 0 (root) to 10001 for security best practices
- Updated USER_HOME from /root to /app to avoid running as root user
Swapping models can take a long time and leave a lot of silence while the model is loading. Rather than silently load the model in the background, this PR allows llama-swap to send status updates in the reasoning_content of a streaming chat response.
Fixes: #366
Switching to use httputil.ReverseProxy in #342 introduced a possible
panic if a client disconnects while streaming the body. Since llama-swap
does not use http.Server the recover() is not automatically there.
- introduce a recover() in Process.ProxyRequest to recover and log the
event
- add TestProcess_ReverseProxyPanicIsHandled to reproduce and test the
fix
fixes: #362
* proxy: refactor metrics recording
- remove metrics_middleware.go as this wrapper is no longer needed. This
also eliminiates double body parsing for the modelID
- move metrics parsing to be part of MetricsMonitor
- refactor how metrics are recording in ProxyManager
- add MetricsMonitor tests
- improve mem efficiency of processStreamingResponse
- add benchmarks for MetricsMonitor.addMetrics
- proxy: refactor MetricsMonitor to be more safe handling errors
fix the extra wake ups being caused by wol-proxy
* cmd/wol-proxy: tweak logs to show what is causing wake ups
* cmd/wol-proxy: add skip wakeup
* cmd/wol-proxy: replace ticker with SSE connection
* cmd/wol-proxy: increase scanner buffer size
* cmd/wol-proxy: improve failure tracking
add a wake-on-lan proxy for llama-swap. When the target llama-swap server is unreachable it will send hold a request, send a WoL packet and proxy the request when llama-swap is available.
* Add optional TLS support
Introduce HTTPS support with net/http Server.ListenAndServeTLS.
This should enable the option of serving via HTTPS without a reverse
proxy.
Add two flags:
- tls-cert-file (path to the TLS certificate file)
- tls-key-file (path to the TLS private key file)
Both flags must be supplied together; otherwise exit with error.
If both flags are present, call srv.ListenAndServeTLS.
If not, fall back to the existing srv.ListenAndServe (HTTP); no changes
to existing non‑TLS behavior.
* Refactor to use httputil.ReverseProxy
Refactor manual HTTP proxying logic in Process.ProxyRequest to use the standard
library's httputil.ReverseProxy.
* Refactor TestProcess_ForceStopWithKill test
Update to handle behavior with httputil.ReverseProxy.
* Fix gin interface conversion panic
Add full macro-in-macro support so any user defined macro can contain another one as long as it was previously declared in the configuration file.
Fixes#336
Supercedes #335
Changes:
- add Metadata key to ModelConfig
- include metadata in /v1/models under meta.llamaswap key
- add recursive macro substitution into Metadata
- change macros at global and model level to be any scalar type
Note:
This is the first mostly AI generated change to llama-swap. See #333 for notes about the workflow and approach to AI going forward.
* proxy/config: add model level macros
Add macros to model configuration. Model macros override macros that are
defined at the global configuration level. They follow the same naming
and value rules as the global macros.
* proxy/config: fix bug with macro reserved name checking
The PORT reserved name was not properly checked
* proxy/config: add tests around model.filters.stripParams
- add check that model.filters.stripParams has no invalid macros
- renamed strip_params to stripParams for camel case consistency
- add legacy code compatibility so model.filters.strip_params continues to work
* proxy/config: add duplicate removal to model.filters.stripParams
* clean up some doc nits
* proxy/config: create config package and migrate configuration
The configuration is become more complex as llama-swap adds more
advanced features. This commit moves config to its own package so it can
be developed independently of the proxy package.
Additionally, enforcing a public API for a configuration will allow
downstream usage to be more decoupled.
This adds a new API endpoint, /api/models/unload/*model, that unloads a single model. In the UI when a model is in a ReadyState it will have a new button to unload it.
Fixes#312
The upstream handler would break on model IDs that contained a forward
slash. Model IDs like "aaa/bbb" called at upstream/aaa/bbb would result
in an error. This commit adds support for model IDs with a forward slash
by iteratively searching the path for a match.
Fixes: #229
* Fix nginx proxy buffering for streaming endpoints
- Add X-Accel-Buffering: no header to SSE endpoints (/api/events, /logs/stream)
- Add X-Accel-Buffering: no header to proxied text/event-stream responses
- Add nginx reverse proxy configuration section to README
- Add tests for X-Accel-Buffering header on streaming endpoints
Fixes#236
* Fix goroutine cleanup in streaming endpoints test
Add context cancellation to TestProxyManager_StreamingEndpointsReturnNoBufferingHeader
to ensure the goroutine is properly cleaned up when the test completes.
- add a TCP connection timeout of 500ms
- increase HTTP client timeout to 5000ms
In this new behaviour the upstream has 500ms to accept a tcp connection
and 5000ms to respond to the HTTP request.