Benson Wong
ed2a50d9a6
fix bug in build-container.sh
2025-02-14 15:27:56 -08:00
Benson Wong
0acfdb9f78
update workflow to build cpu and disable musa
2025-02-14 15:26:59 -08:00
Benson Wong
96a8ea0241
add cpu docker container build
2025-02-14 15:25:45 -08:00
Benson Wong
f20f2c9b7a
add docs and container build improvements #43
2025-02-14 12:20:07 -08:00
Benson Wong
7a97c38828
enable parallel container built #46
2025-02-14 11:04:33 -08:00
Benson Wong
4885132565
more permissions futzing
2025-02-14 11:02:15 -08:00
Benson Wong
8b46a0b7f1
grant package:write to container workflow #46
2025-02-14 10:55:30 -08:00
Benson Wong
1b6736ec6f
rename workflow for containers
2025-02-14 10:50:15 -08:00
Benson Wong
ddc1ce031e
fix container file name #46
2025-02-14 10:49:44 -08:00
Benson Wong
11d024bbaa
just build cuda while debugging
2025-02-14 10:48:06 -08:00
Benson Wong
43e23c16dc
add check for GITHUB_TOKEN #46
2025-02-14 10:47:25 -08:00
Benson Wong
f9c8e763ba
add execute bit on build-container.sh
2025-02-14 10:44:53 -08:00
Benson Wong
d7e1bb9f7c
add GITHUB_TOKEN to container build env
2025-02-14 10:43:44 -08:00
Benson Wong
ab93460a8b
first container code ( #52 )
2025-02-14 10:39:25 -08:00
Benson Wong
13d4552edc
Add FreeBSD/amd64 to auto built releases ( #51 )
2025-02-13 16:44:31 -08:00
Benson Wong
6667e307a2
Update README.md
2025-02-08 10:28:35 -08:00
Benson Wong
7ac446e6a9
Update README.md
2025-02-08 10:26:11 -08:00
Benson Wong
eab9795bcc
remove panic() when cmd or process is nil
2025-02-07 14:00:32 -08:00
Benson Wong
09bdd86b54
Improve shutdown behaviour ( #47 ) ( #49 )
...
Introduce `Process.Shutdown()` and `ProxyManager.Shutdown()`. These two function required a lot of internal process state management refactoring. A key benefit is that `Process.start()` is now interruptable. When `Shutdown()` is called it will break the long health check loop.
State management within Process is also improved. Added `starting`, `stopping` and `shutdown` states. Additionally, introduced a simple finite state machine to manage transitions.
2025-02-05 17:19:59 -08:00
Benson Wong
85cd74a51c
Improve process start and stop reliability ( #38 )
...
Refactor Process.start()/Stop() logic (#38 )
- remove cmd.Wait() call in start(). This seems to conflict with the one
in .Stop(). Removing it eliminated no child errors
- eliminate goroutines in .start() as it no longer required
2025-02-03 11:50:38 -08:00
Benson Wong
314d2f2212
remove cmd_stop configuration and functionality from PR #40 ( #44 )
...
* remove cmd_stop functionality from #40
2025-01-31 12:42:44 -08:00
Benson Wong
fad25f3e11
Use client request context in proxy request ( #43 )
...
Canceled or closed HTTP requests from clients will also stop the proxied
HTTP requests to upstreamed servers.
2025-01-31 10:21:49 -08:00
Benson Wong
2c3e3e27f7
Support OPTIONS requests ( #42 )
...
Add middleware that responds with permissive OPTIONS headers
for all request paths.
2025-01-31 10:09:07 -08:00
Benson Wong
baeb0c4e7f
Add cmd_stop configuration to better support docker ( #35 )
...
Add `cmd_stop` to model configuration to run a command instead of sending a SIGTERM to shutdown a process before swapping.
2025-01-30 16:59:57 -08:00
Benson Wong
2833517eef
Improve handling of process that do not handle SIGTERM ( #38 )
...
- Process TTL goroutine did not have a return after .Stop()
- Improve logging
- Add test TestProcess_LowTTLValue to measure SIGTERM error rate
2025-01-20 14:39:52 -08:00
Benson Wong
abdc2bfdb3
Fix panic when requesting non-members of profiles
...
A panic occurs when a request for an invalid profile:model pair is made.
The edge case is that the profile exists and the model exists but they're
not configured as a pair.
This adds an additional check to make sure the profile:model pair is
valid before attempting to swap the model.
2025-01-16 12:06:38 -08:00
Benson Wong
c3b834737f
Update README.md
2025-01-13 22:37:30 -08:00
Benson Wong
3c8e727b73
Update README.md
2025-01-12 19:48:35 -08:00
Benson Wong
3a1e9f81f1
support TTS /v1/audio/speech ( #36 )
2025-01-12 16:27:01 -08:00
Benson Wong
72c883f36c
Update README.md
2025-01-02 09:01:51 -08:00
Benson Wong
1b04d034cf
Update README.md
2025-01-02 08:59:11 -08:00
Benson Wong
2e45f5692a
Update README.md
...
Improve README documentation.
2025-01-01 12:51:24 -08:00
Benson Wong
c97b80bdfe
Update README.md
2025-01-01 12:25:45 -08:00
Benson Wong
ae3ef9bc39
Refactor UI ( #33 )
...
- add html to / instead of 404
- add client side regex to /logs
2024-12-23 19:48:59 -08:00
Benson Wong
db6715bec3
update golang.org/x/net -> v0.33.0 for dependabot
2024-12-20 11:28:32 -08:00
Benson Wong
da5d9e8a6a
fix HTTP logging so true path is printed
2024-12-20 11:25:01 -08:00
Benson Wong
84b667ca7a
improve logging and error reporting for troubleshooting
2024-12-20 10:46:56 -08:00
Benson Wong
29657106fc
add more OpenAI API supported in README
2024-12-20 10:08:20 -08:00
Benson Wong
9c8860471e
support v1/rerank endpoint
2024-12-17 21:22:25 -08:00
Benson Wong
9b4e3f307e
rename proxy handler
2024-12-17 17:25:10 -08:00
Benson Wong
6fe37c3abf
support /v1/embeddings ( #4 )
2024-12-17 17:25:10 -08:00
Benson Wong
7f45493a37
Update README.md
2024-12-17 14:45:41 -08:00
Benson Wong
891f6a5b5a
Add /upstream endpoint ( #30 )
...
* remove catch-all route to upstream proxy (it was broken anyways)
* add /upstream/:model_id to swap and route to upstream path
* add /upstream HTML endpoint and unlisted option
* add /upstream endpoint to show a list of available models
* add `unlisted` configuration option to omit a model from /v1/models and /upstream lists
* add favicon.ico
2024-12-17 14:37:44 -08:00
Benson Wong
7183f6b43d
fix bad logging due to wrong []byte used #28
2024-12-16 16:22:14 -08:00
Benson Wong
d89bfeb441
add .DS_Store to .gitignore
2024-12-16 12:30:31 -08:00
Benson Wong
9a0c6bed40
Improve stop exceptions ( #28 ) ( #29 )
...
Stop Process TTL goroutine when process is not ready (#28 )
- fix issue where the goroutine will continue even though the child
process is no longer running and the Process' state is not Ready
- fix issue where some logs were going to stdout instead of p.logMonitor
causing them to not show up in the /logs
- add units to unloading model message
2024-12-16 12:29:25 -08:00
Benson Wong
d6ca535939
tweak release tagging so it is not based on number of commits
2024-12-14 15:46:10 -08:00
Benson Wong
27302c0c02
change llama-swap to use goreleaser default ldflag values
2024-12-14 10:30:06 -08:00
Benson Wong
d4e22cceaa
Fix security vulnerability with golang.org/x/crypto
...
- does not affect the project as llama-swap does not use the crypto
libraries
- good practice to keep security deps updated!
2024-12-14 10:20:22 -08:00
Benson Wong
4c94927658
Move release to Makefile out of goreleaser
...
- less complexity
- easier
- goreleaser, github, pipelines: 1... mostlygeek: 0
2024-12-14 10:16:46 -08:00