Commit Graph

289 Commits

Author SHA1 Message Date
Benson Wong
53338938bd increase health check to a minimum of 5 seconds 2025-03-03 10:04:08 -08:00
Benson Wong
af653347ae Update README.md w/ starhistory graph 2025-02-27 16:43:34 -08:00
Benson Wong
1e25b44a06 add workflow_dispatch to release action 2025-02-18 17:27:43 -08:00
Benson Wong
0815bb4cc3 Add windows to goreleaser #54 2025-02-18 17:26:43 -08:00
daschiller
7187cfe52e add Windows build support to Makefile (#54) 2025-02-18 17:24:31 -08:00
Benson Wong
24089d2d9c remove "no musa container" note from README 2025-02-18 16:38:48 -08:00
Benson Wong
ebabe55ff3 Delete untagged packages after build and push (#55) 2025-02-18 10:32:32 -08:00
Benson Wong
41a338297c deletion of untagged containers happen after build-and-push 2025-02-18 10:11:59 -08:00
Benson Wong
7e3353efeb add action step to remove untagged containers 2025-02-18 10:08:41 -08:00
Benson Wong
4ed58fb173 update container build action 2025-02-18 09:59:06 -08:00
Benson Wong
f5a2be698d revert package src until new ggml-org has them 2025-02-15 18:23:58 -08:00
Benson Wong
f5e6ec3b7a fix package src in containerfile 2025-02-15 18:20:35 -08:00
Benson Wong
3f462da146 switch package source from ggerganov to ggml-org 2025-02-15 18:18:49 -08:00
Benson Wong
48bd766536 Update README.md 2025-02-14 22:05:52 -08:00
Benson Wong
8d319da4dd improve README organization (i think...) 2025-02-14 15:59:12 -08:00
Benson Wong
be7c502448 improve docs 2025-02-14 15:47:31 -08:00
Benson Wong
92336f00bf more container build fixes 2025-02-14 15:34:38 -08:00
Benson Wong
ed2a50d9a6 fix bug in build-container.sh 2025-02-14 15:27:56 -08:00
Benson Wong
0acfdb9f78 update workflow to build cpu and disable musa 2025-02-14 15:26:59 -08:00
Benson Wong
96a8ea0241 add cpu docker container build 2025-02-14 15:25:45 -08:00
Benson Wong
f20f2c9b7a add docs and container build improvements #43 2025-02-14 12:20:07 -08:00
Benson Wong
7a97c38828 enable parallel container built #46 2025-02-14 11:04:33 -08:00
Benson Wong
4885132565 more permissions futzing 2025-02-14 11:02:15 -08:00
Benson Wong
8b46a0b7f1 grant package:write to container workflow #46 2025-02-14 10:55:30 -08:00
Benson Wong
1b6736ec6f rename workflow for containers 2025-02-14 10:50:15 -08:00
Benson Wong
ddc1ce031e fix container file name #46 2025-02-14 10:49:44 -08:00
Benson Wong
11d024bbaa just build cuda while debugging 2025-02-14 10:48:06 -08:00
Benson Wong
43e23c16dc add check for GITHUB_TOKEN #46 2025-02-14 10:47:25 -08:00
Benson Wong
f9c8e763ba add execute bit on build-container.sh 2025-02-14 10:44:53 -08:00
Benson Wong
d7e1bb9f7c add GITHUB_TOKEN to container build env 2025-02-14 10:43:44 -08:00
Benson Wong
ab93460a8b first container code (#52) 2025-02-14 10:39:25 -08:00
Benson Wong
13d4552edc Add FreeBSD/amd64 to auto built releases (#51) 2025-02-13 16:44:31 -08:00
Benson Wong
6667e307a2 Update README.md 2025-02-08 10:28:35 -08:00
Benson Wong
7ac446e6a9 Update README.md 2025-02-08 10:26:11 -08:00
Benson Wong
eab9795bcc remove panic() when cmd or process is nil 2025-02-07 14:00:32 -08:00
Benson Wong
09bdd86b54 Improve shutdown behaviour (#47) (#49)
Introduce `Process.Shutdown()` and `ProxyManager.Shutdown()`. These two function required a lot of internal process state management refactoring. A key benefit is that `Process.start()` is now interruptable. When `Shutdown()` is called it will break the long health check loop. 

State management within Process is also improved. Added `starting`, `stopping` and `shutdown` states. Additionally, introduced a simple finite state machine to manage transitions.
2025-02-05 17:19:59 -08:00
Benson Wong
85cd74a51c Improve process start and stop reliability (#38)
Refactor Process.start()/Stop() logic (#38)
- remove cmd.Wait() call in start(). This seems to conflict with the one
  in .Stop(). Removing it eliminated no child errors
- eliminate goroutines in .start() as it no longer required
2025-02-03 11:50:38 -08:00
Benson Wong
314d2f2212 remove cmd_stop configuration and functionality from PR #40 (#44)
* remove cmd_stop functionality from #40
2025-01-31 12:42:44 -08:00
Benson Wong
fad25f3e11 Use client request context in proxy request (#43)
Canceled or closed HTTP requests from clients will also stop the proxied
HTTP requests to upstreamed servers.
2025-01-31 10:21:49 -08:00
Benson Wong
2c3e3e27f7 Support OPTIONS requests (#42)
Add middleware that responds with permissive OPTIONS headers
for all request paths.
2025-01-31 10:09:07 -08:00
Benson Wong
baeb0c4e7f Add cmd_stop configuration to better support docker (#35)
Add `cmd_stop` to model configuration to run a command instead of sending a SIGTERM to shutdown a process before swapping.
2025-01-30 16:59:57 -08:00
Benson Wong
2833517eef Improve handling of process that do not handle SIGTERM (#38)
- Process TTL goroutine did not have a return after .Stop()
- Improve logging
- Add test TestProcess_LowTTLValue to measure SIGTERM error rate
2025-01-20 14:39:52 -08:00
Benson Wong
abdc2bfdb3 Fix panic when requesting non-members of profiles
A panic occurs when a request for an invalid profile:model pair is made.
The edge case is that the profile exists and the model exists but they're
not configured as a pair.

This adds an additional check to make sure the profile:model pair is
valid before attempting to swap the model.
2025-01-16 12:06:38 -08:00
Benson Wong
c3b834737f Update README.md 2025-01-13 22:37:30 -08:00
Benson Wong
3c8e727b73 Update README.md 2025-01-12 19:48:35 -08:00
Benson Wong
3a1e9f81f1 support TTS /v1/audio/speech (#36) 2025-01-12 16:27:01 -08:00
Benson Wong
72c883f36c Update README.md 2025-01-02 09:01:51 -08:00
Benson Wong
1b04d034cf Update README.md 2025-01-02 08:59:11 -08:00
Benson Wong
2e45f5692a Update README.md
Improve README documentation.
2025-01-01 12:51:24 -08:00
Benson Wong
c97b80bdfe Update README.md 2025-01-01 12:25:45 -08:00