Commit Graph

308 Commits

Author SHA1 Message Date
Benson Wong
d94db42ffe fix bug checking incorrect error 2025-03-20 15:49:36 -07:00
Benson Wong
93cd83c55c add override for windows (#76) 2025-03-20 13:23:04 -07:00
Benson Wong
5565fca3ac add some badges to README 2025-03-19 11:25:06 -07:00
Benson Wong
d625ab8d92 Refactor process state management (#70) (#73)
* add isValidStateTransition helper function
* Replace Process.setState() with Process.swapState()
* Refactor locking logic in Process
2025-03-15 17:14:03 -07:00
Benson Wong
a3f82c140b tidy up config examples in README 2025-03-15 10:36:45 -07:00
Benson Wong
5c97299e7b Add support for sending a custom model name to upstream (#69) (#71)
* add test for splitRequestedModel()
* Add `useModelName` parameter to model configuration
* add docs to README
2025-03-14 21:07:52 -07:00
Benson Wong
671c1a5a7b update deps 2025-03-13 14:00:15 -07:00
Benson Wong
52c0196e0f clean up feature list in readme 2025-03-13 13:55:20 -07:00
Benson Wong
3201a68a04 Add /v1/audio/transcriptions support (#41)
* add support for /v1/audio/transcriptions
2025-03-13 13:49:39 -07:00
Florin-Gabriel Dumitru
3ac94ad20e Adds an endpoint '/running' (#61)
* Adds an endpoint '/running' that returns either an empty JSON object if no model has been loaded so far, or the last model loaded (model key) and it's current state (state key). Possible state values are: stopped, starting, ready and stopping.

* Improves the `/running` endpoint by allowing multiple entries under the `running` key within the JSON response.
Refactors the `/running` method name (listRunningProcessesHandler).
Removes the unlisted filter implementation.

* Adds tests for:
- no model loaded
- one model loaded
- multiple models loaded

* Adds simple comments.

* Simplified code structure as per 250313 comments on PR #65.

---------

Co-authored-by: FGDumitru|B <xelotx@gmail.com>
2025-03-13 13:42:59 -07:00
Benson Wong
60355bf74a fix some potentially confusing Process.start() comment 2025-03-11 11:00:45 -07:00
Benson Wong
9b2ed244e2 Improve Continuous integration and fix concurrency bugs (#66)
- improvements to the continuous GH actions
- fix edge case concurrency bugs with Process.start() and state transitions discovered setting up CI.
2025-03-11 10:39:14 -07:00
Benson Wong
eeb72297f7 add first version of CI for go 2025-03-11 08:45:56 -07:00
Benson Wong
eabfe70cc6 add GH action to close inactive issues 2025-03-09 19:51:48 -07:00
Benson Wong
29cd98878d better container build logic when upstream containers do not exist 2025-03-09 13:02:06 -07:00
Benson Wong
b3d331da0d Properly strip profile name slug from models fixes (#62)
The profile slug in a model name, `profile:model`, is specific to
llama-swap. This strips `profile:` out of the model name request so
upstreams that expect just `model` work and do not require knowing about
the profile slug.
2025-03-09 12:41:52 -07:00
Benson Wong
62275e078d add examples to restart on config change #59 2025-03-06 10:50:29 -08:00
Benson Wong
88916059e1 add /unload to docs 2025-03-03 10:44:16 -08:00
Benson Wong
082d5d0fc5 Add /unload endpoint (#58) to unload all currently running models 2025-03-03 10:33:36 -08:00
Benson Wong
53338938bd increase health check to a minimum of 5 seconds 2025-03-03 10:04:08 -08:00
Benson Wong
af653347ae Update README.md w/ starhistory graph 2025-02-27 16:43:34 -08:00
Benson Wong
1e25b44a06 add workflow_dispatch to release action 2025-02-18 17:27:43 -08:00
Benson Wong
0815bb4cc3 Add windows to goreleaser #54 2025-02-18 17:26:43 -08:00
daschiller
7187cfe52e add Windows build support to Makefile (#54) 2025-02-18 17:24:31 -08:00
Benson Wong
24089d2d9c remove "no musa container" note from README 2025-02-18 16:38:48 -08:00
Benson Wong
ebabe55ff3 Delete untagged packages after build and push (#55) 2025-02-18 10:32:32 -08:00
Benson Wong
41a338297c deletion of untagged containers happen after build-and-push 2025-02-18 10:11:59 -08:00
Benson Wong
7e3353efeb add action step to remove untagged containers 2025-02-18 10:08:41 -08:00
Benson Wong
4ed58fb173 update container build action 2025-02-18 09:59:06 -08:00
Benson Wong
f5a2be698d revert package src until new ggml-org has them 2025-02-15 18:23:58 -08:00
Benson Wong
f5e6ec3b7a fix package src in containerfile 2025-02-15 18:20:35 -08:00
Benson Wong
3f462da146 switch package source from ggerganov to ggml-org 2025-02-15 18:18:49 -08:00
Benson Wong
48bd766536 Update README.md 2025-02-14 22:05:52 -08:00
Benson Wong
8d319da4dd improve README organization (i think...) 2025-02-14 15:59:12 -08:00
Benson Wong
be7c502448 improve docs 2025-02-14 15:47:31 -08:00
Benson Wong
92336f00bf more container build fixes 2025-02-14 15:34:38 -08:00
Benson Wong
ed2a50d9a6 fix bug in build-container.sh 2025-02-14 15:27:56 -08:00
Benson Wong
0acfdb9f78 update workflow to build cpu and disable musa 2025-02-14 15:26:59 -08:00
Benson Wong
96a8ea0241 add cpu docker container build 2025-02-14 15:25:45 -08:00
Benson Wong
f20f2c9b7a add docs and container build improvements #43 2025-02-14 12:20:07 -08:00
Benson Wong
7a97c38828 enable parallel container built #46 2025-02-14 11:04:33 -08:00
Benson Wong
4885132565 more permissions futzing 2025-02-14 11:02:15 -08:00
Benson Wong
8b46a0b7f1 grant package:write to container workflow #46 2025-02-14 10:55:30 -08:00
Benson Wong
1b6736ec6f rename workflow for containers 2025-02-14 10:50:15 -08:00
Benson Wong
ddc1ce031e fix container file name #46 2025-02-14 10:49:44 -08:00
Benson Wong
11d024bbaa just build cuda while debugging 2025-02-14 10:48:06 -08:00
Benson Wong
43e23c16dc add check for GITHUB_TOKEN #46 2025-02-14 10:47:25 -08:00
Benson Wong
f9c8e763ba add execute bit on build-container.sh 2025-02-14 10:44:53 -08:00
Benson Wong
d7e1bb9f7c add GITHUB_TOKEN to container build env 2025-02-14 10:43:44 -08:00
Benson Wong
ab93460a8b first container code (#52) 2025-02-14 10:39:25 -08:00