Benson Wong
3201a68a04
Add /v1/audio/transcriptions support ( #41 )
...
* add support for /v1/audio/transcriptions
2025-03-13 13:49:39 -07:00
Florin-Gabriel Dumitru
3ac94ad20e
Adds an endpoint '/running' ( #61 )
...
* Adds an endpoint '/running' that returns either an empty JSON object if no model has been loaded so far, or the last model loaded (model key) and it's current state (state key). Possible state values are: stopped, starting, ready and stopping.
* Improves the `/running` endpoint by allowing multiple entries under the `running` key within the JSON response.
Refactors the `/running` method name (listRunningProcessesHandler).
Removes the unlisted filter implementation.
* Adds tests for:
- no model loaded
- one model loaded
- multiple models loaded
* Adds simple comments.
* Simplified code structure as per 250313 comments on PR #65 .
---------
Co-authored-by: FGDumitru|B <xelotx@gmail.com >
2025-03-13 13:42:59 -07:00
Benson Wong
60355bf74a
fix some potentially confusing Process.start() comment
2025-03-11 11:00:45 -07:00
Benson Wong
9b2ed244e2
Improve Continuous integration and fix concurrency bugs ( #66 )
...
- improvements to the continuous GH actions
- fix edge case concurrency bugs with Process.start() and state transitions discovered setting up CI.
2025-03-11 10:39:14 -07:00
Benson Wong
eeb72297f7
add first version of CI for go
2025-03-11 08:45:56 -07:00
Benson Wong
eabfe70cc6
add GH action to close inactive issues
2025-03-09 19:51:48 -07:00
Benson Wong
29cd98878d
better container build logic when upstream containers do not exist
2025-03-09 13:02:06 -07:00
Benson Wong
b3d331da0d
Properly strip profile name slug from models fixes ( #62 )
...
The profile slug in a model name, `profile:model`, is specific to
llama-swap. This strips `profile:` out of the model name request so
upstreams that expect just `model` work and do not require knowing about
the profile slug.
2025-03-09 12:41:52 -07:00
Benson Wong
62275e078d
add examples to restart on config change #59
2025-03-06 10:50:29 -08:00
Benson Wong
88916059e1
add /unload to docs
2025-03-03 10:44:16 -08:00
Benson Wong
082d5d0fc5
Add /unload endpoint ( #58 ) to unload all currently running models
2025-03-03 10:33:36 -08:00
Benson Wong
53338938bd
increase health check to a minimum of 5 seconds
2025-03-03 10:04:08 -08:00
Benson Wong
af653347ae
Update README.md w/ starhistory graph
2025-02-27 16:43:34 -08:00
Benson Wong
1e25b44a06
add workflow_dispatch to release action
2025-02-18 17:27:43 -08:00
Benson Wong
0815bb4cc3
Add windows to goreleaser #54
2025-02-18 17:26:43 -08:00
daschiller
7187cfe52e
add Windows build support to Makefile ( #54 )
2025-02-18 17:24:31 -08:00
Benson Wong
24089d2d9c
remove "no musa container" note from README
2025-02-18 16:38:48 -08:00
Benson Wong
ebabe55ff3
Delete untagged packages after build and push ( #55 )
2025-02-18 10:32:32 -08:00
Benson Wong
41a338297c
deletion of untagged containers happen after build-and-push
2025-02-18 10:11:59 -08:00
Benson Wong
7e3353efeb
add action step to remove untagged containers
2025-02-18 10:08:41 -08:00
Benson Wong
4ed58fb173
update container build action
2025-02-18 09:59:06 -08:00
Benson Wong
f5a2be698d
revert package src until new ggml-org has them
2025-02-15 18:23:58 -08:00
Benson Wong
f5e6ec3b7a
fix package src in containerfile
2025-02-15 18:20:35 -08:00
Benson Wong
3f462da146
switch package source from ggerganov to ggml-org
2025-02-15 18:18:49 -08:00
Benson Wong
48bd766536
Update README.md
2025-02-14 22:05:52 -08:00
Benson Wong
8d319da4dd
improve README organization (i think...)
2025-02-14 15:59:12 -08:00
Benson Wong
be7c502448
improve docs
2025-02-14 15:47:31 -08:00
Benson Wong
92336f00bf
more container build fixes
2025-02-14 15:34:38 -08:00
Benson Wong
ed2a50d9a6
fix bug in build-container.sh
2025-02-14 15:27:56 -08:00
Benson Wong
0acfdb9f78
update workflow to build cpu and disable musa
2025-02-14 15:26:59 -08:00
Benson Wong
96a8ea0241
add cpu docker container build
2025-02-14 15:25:45 -08:00
Benson Wong
f20f2c9b7a
add docs and container build improvements #43
2025-02-14 12:20:07 -08:00
Benson Wong
7a97c38828
enable parallel container built #46
2025-02-14 11:04:33 -08:00
Benson Wong
4885132565
more permissions futzing
2025-02-14 11:02:15 -08:00
Benson Wong
8b46a0b7f1
grant package:write to container workflow #46
2025-02-14 10:55:30 -08:00
Benson Wong
1b6736ec6f
rename workflow for containers
2025-02-14 10:50:15 -08:00
Benson Wong
ddc1ce031e
fix container file name #46
2025-02-14 10:49:44 -08:00
Benson Wong
11d024bbaa
just build cuda while debugging
2025-02-14 10:48:06 -08:00
Benson Wong
43e23c16dc
add check for GITHUB_TOKEN #46
2025-02-14 10:47:25 -08:00
Benson Wong
f9c8e763ba
add execute bit on build-container.sh
2025-02-14 10:44:53 -08:00
Benson Wong
d7e1bb9f7c
add GITHUB_TOKEN to container build env
2025-02-14 10:43:44 -08:00
Benson Wong
ab93460a8b
first container code ( #52 )
2025-02-14 10:39:25 -08:00
Benson Wong
13d4552edc
Add FreeBSD/amd64 to auto built releases ( #51 )
2025-02-13 16:44:31 -08:00
Benson Wong
6667e307a2
Update README.md
2025-02-08 10:28:35 -08:00
Benson Wong
7ac446e6a9
Update README.md
2025-02-08 10:26:11 -08:00
Benson Wong
eab9795bcc
remove panic() when cmd or process is nil
2025-02-07 14:00:32 -08:00
Benson Wong
09bdd86b54
Improve shutdown behaviour ( #47 ) ( #49 )
...
Introduce `Process.Shutdown()` and `ProxyManager.Shutdown()`. These two function required a lot of internal process state management refactoring. A key benefit is that `Process.start()` is now interruptable. When `Shutdown()` is called it will break the long health check loop.
State management within Process is also improved. Added `starting`, `stopping` and `shutdown` states. Additionally, introduced a simple finite state machine to manage transitions.
2025-02-05 17:19:59 -08:00
Benson Wong
85cd74a51c
Improve process start and stop reliability ( #38 )
...
Refactor Process.start()/Stop() logic (#38 )
- remove cmd.Wait() call in start(). This seems to conflict with the one
in .Stop(). Removing it eliminated no child errors
- eliminate goroutines in .start() as it no longer required
2025-02-03 11:50:38 -08:00
Benson Wong
314d2f2212
remove cmd_stop configuration and functionality from PR #40 ( #44 )
...
* remove cmd_stop functionality from #40
2025-01-31 12:42:44 -08:00
Benson Wong
fad25f3e11
Use client request context in proxy request ( #43 )
...
Canceled or closed HTTP requests from clients will also stop the proxied
HTTP requests to upstreamed servers.
2025-01-31 10:21:49 -08:00