Commit Graph

131 Commits

Author SHA1 Message Date
Benson Wong
7e3353efeb add action step to remove untagged containers 2025-02-18 10:08:41 -08:00
Benson Wong
4ed58fb173 update container build action 2025-02-18 09:59:06 -08:00
Benson Wong
f5a2be698d revert package src until new ggml-org has them 2025-02-15 18:23:58 -08:00
Benson Wong
f5e6ec3b7a fix package src in containerfile 2025-02-15 18:20:35 -08:00
Benson Wong
3f462da146 switch package source from ggerganov to ggml-org 2025-02-15 18:18:49 -08:00
Benson Wong
48bd766536 Update README.md 2025-02-14 22:05:52 -08:00
Benson Wong
8d319da4dd improve README organization (i think...) 2025-02-14 15:59:12 -08:00
Benson Wong
be7c502448 improve docs 2025-02-14 15:47:31 -08:00
Benson Wong
92336f00bf more container build fixes 2025-02-14 15:34:38 -08:00
Benson Wong
ed2a50d9a6 fix bug in build-container.sh 2025-02-14 15:27:56 -08:00
Benson Wong
0acfdb9f78 update workflow to build cpu and disable musa 2025-02-14 15:26:59 -08:00
Benson Wong
96a8ea0241 add cpu docker container build 2025-02-14 15:25:45 -08:00
Benson Wong
f20f2c9b7a add docs and container build improvements #43 2025-02-14 12:20:07 -08:00
Benson Wong
7a97c38828 enable parallel container built #46 2025-02-14 11:04:33 -08:00
Benson Wong
4885132565 more permissions futzing 2025-02-14 11:02:15 -08:00
Benson Wong
8b46a0b7f1 grant package:write to container workflow #46 2025-02-14 10:55:30 -08:00
Benson Wong
1b6736ec6f rename workflow for containers 2025-02-14 10:50:15 -08:00
Benson Wong
ddc1ce031e fix container file name #46 2025-02-14 10:49:44 -08:00
Benson Wong
11d024bbaa just build cuda while debugging 2025-02-14 10:48:06 -08:00
Benson Wong
43e23c16dc add check for GITHUB_TOKEN #46 2025-02-14 10:47:25 -08:00
Benson Wong
f9c8e763ba add execute bit on build-container.sh 2025-02-14 10:44:53 -08:00
Benson Wong
d7e1bb9f7c add GITHUB_TOKEN to container build env 2025-02-14 10:43:44 -08:00
Benson Wong
ab93460a8b first container code (#52) 2025-02-14 10:39:25 -08:00
Benson Wong
13d4552edc Add FreeBSD/amd64 to auto built releases (#51) 2025-02-13 16:44:31 -08:00
Benson Wong
6667e307a2 Update README.md 2025-02-08 10:28:35 -08:00
Benson Wong
7ac446e6a9 Update README.md 2025-02-08 10:26:11 -08:00
Benson Wong
eab9795bcc remove panic() when cmd or process is nil 2025-02-07 14:00:32 -08:00
Benson Wong
09bdd86b54 Improve shutdown behaviour (#47) (#49)
Introduce `Process.Shutdown()` and `ProxyManager.Shutdown()`. These two function required a lot of internal process state management refactoring. A key benefit is that `Process.start()` is now interruptable. When `Shutdown()` is called it will break the long health check loop. 

State management within Process is also improved. Added `starting`, `stopping` and `shutdown` states. Additionally, introduced a simple finite state machine to manage transitions.
2025-02-05 17:19:59 -08:00
Benson Wong
85cd74a51c Improve process start and stop reliability (#38)
Refactor Process.start()/Stop() logic (#38)
- remove cmd.Wait() call in start(). This seems to conflict with the one
  in .Stop(). Removing it eliminated no child errors
- eliminate goroutines in .start() as it no longer required
2025-02-03 11:50:38 -08:00
Benson Wong
314d2f2212 remove cmd_stop configuration and functionality from PR #40 (#44)
* remove cmd_stop functionality from #40
2025-01-31 12:42:44 -08:00
Benson Wong
fad25f3e11 Use client request context in proxy request (#43)
Canceled or closed HTTP requests from clients will also stop the proxied
HTTP requests to upstreamed servers.
2025-01-31 10:21:49 -08:00
Benson Wong
2c3e3e27f7 Support OPTIONS requests (#42)
Add middleware that responds with permissive OPTIONS headers
for all request paths.
2025-01-31 10:09:07 -08:00
Benson Wong
baeb0c4e7f Add cmd_stop configuration to better support docker (#35)
Add `cmd_stop` to model configuration to run a command instead of sending a SIGTERM to shutdown a process before swapping.
2025-01-30 16:59:57 -08:00
Benson Wong
2833517eef Improve handling of process that do not handle SIGTERM (#38)
- Process TTL goroutine did not have a return after .Stop()
- Improve logging
- Add test TestProcess_LowTTLValue to measure SIGTERM error rate
2025-01-20 14:39:52 -08:00
Benson Wong
abdc2bfdb3 Fix panic when requesting non-members of profiles
A panic occurs when a request for an invalid profile:model pair is made.
The edge case is that the profile exists and the model exists but they're
not configured as a pair.

This adds an additional check to make sure the profile:model pair is
valid before attempting to swap the model.
2025-01-16 12:06:38 -08:00
Benson Wong
c3b834737f Update README.md 2025-01-13 22:37:30 -08:00
Benson Wong
3c8e727b73 Update README.md 2025-01-12 19:48:35 -08:00
Benson Wong
3a1e9f81f1 support TTS /v1/audio/speech (#36) 2025-01-12 16:27:01 -08:00
Benson Wong
72c883f36c Update README.md 2025-01-02 09:01:51 -08:00
Benson Wong
1b04d034cf Update README.md 2025-01-02 08:59:11 -08:00
Benson Wong
2e45f5692a Update README.md
Improve README documentation.
2025-01-01 12:51:24 -08:00
Benson Wong
c97b80bdfe Update README.md 2025-01-01 12:25:45 -08:00
Benson Wong
ae3ef9bc39 Refactor UI (#33)
- add html to / instead of 404
- add client side regex to /logs
2024-12-23 19:48:59 -08:00
Benson Wong
db6715bec3 update golang.org/x/net -> v0.33.0 for dependabot 2024-12-20 11:28:32 -08:00
Benson Wong
da5d9e8a6a fix HTTP logging so true path is printed 2024-12-20 11:25:01 -08:00
Benson Wong
84b667ca7a improve logging and error reporting for troubleshooting 2024-12-20 10:46:56 -08:00
Benson Wong
29657106fc add more OpenAI API supported in README 2024-12-20 10:08:20 -08:00
Benson Wong
9c8860471e support v1/rerank endpoint 2024-12-17 21:22:25 -08:00
Benson Wong
9b4e3f307e rename proxy handler 2024-12-17 17:25:10 -08:00
Benson Wong
6fe37c3abf support /v1/embeddings (#4) 2024-12-17 17:25:10 -08:00