Benson Wong
62275e078d
add examples to restart on config change #59
2025-03-06 10:50:29 -08:00
Benson Wong
88916059e1
add /unload to docs
2025-03-03 10:44:16 -08:00
Benson Wong
af653347ae
Update README.md w/ starhistory graph
2025-02-27 16:43:34 -08:00
daschiller
7187cfe52e
add Windows build support to Makefile ( #54 )
2025-02-18 17:24:31 -08:00
Benson Wong
24089d2d9c
remove "no musa container" note from README
2025-02-18 16:38:48 -08:00
Benson Wong
48bd766536
Update README.md
2025-02-14 22:05:52 -08:00
Benson Wong
8d319da4dd
improve README organization (i think...)
2025-02-14 15:59:12 -08:00
Benson Wong
be7c502448
improve docs
2025-02-14 15:47:31 -08:00
Benson Wong
96a8ea0241
add cpu docker container build
2025-02-14 15:25:45 -08:00
Benson Wong
f20f2c9b7a
add docs and container build improvements #43
2025-02-14 12:20:07 -08:00
Benson Wong
6667e307a2
Update README.md
2025-02-08 10:28:35 -08:00
Benson Wong
7ac446e6a9
Update README.md
2025-02-08 10:26:11 -08:00
Benson Wong
314d2f2212
remove cmd_stop configuration and functionality from PR #40 ( #44 )
...
* remove cmd_stop functionality from #40
2025-01-31 12:42:44 -08:00
Benson Wong
baeb0c4e7f
Add cmd_stop configuration to better support docker ( #35 )
...
Add `cmd_stop` to model configuration to run a command instead of sending a SIGTERM to shutdown a process before swapping.
2025-01-30 16:59:57 -08:00
Benson Wong
c3b834737f
Update README.md
2025-01-13 22:37:30 -08:00
Benson Wong
3c8e727b73
Update README.md
2025-01-12 19:48:35 -08:00
Benson Wong
3a1e9f81f1
support TTS /v1/audio/speech ( #36 )
2025-01-12 16:27:01 -08:00
Benson Wong
72c883f36c
Update README.md
2025-01-02 09:01:51 -08:00
Benson Wong
1b04d034cf
Update README.md
2025-01-02 08:59:11 -08:00
Benson Wong
2e45f5692a
Update README.md
...
Improve README documentation.
2025-01-01 12:51:24 -08:00
Benson Wong
c97b80bdfe
Update README.md
2025-01-01 12:25:45 -08:00
Benson Wong
84b667ca7a
improve logging and error reporting for troubleshooting
2024-12-20 10:46:56 -08:00
Benson Wong
29657106fc
add more OpenAI API supported in README
2024-12-20 10:08:20 -08:00
Benson Wong
9c8860471e
support v1/rerank endpoint
2024-12-17 21:22:25 -08:00
Benson Wong
7f45493a37
Update README.md
2024-12-17 14:45:41 -08:00
Benson Wong
891f6a5b5a
Add /upstream endpoint ( #30 )
...
* remove catch-all route to upstream proxy (it was broken anyways)
* add /upstream/:model_id to swap and route to upstream path
* add /upstream HTML endpoint and unlisted option
* add /upstream endpoint to show a list of available models
* add `unlisted` configuration option to omit a model from /v1/models and /upstream lists
* add favicon.ico
2024-12-17 14:37:44 -08:00
Benson Wong
e2443251ad
update readme
2024-12-09 19:14:49 -08:00
Benson Wong
97dae50dc4
update readme
2024-12-08 21:34:16 -08:00
Benson Wong
cb978f760f
add web interface to /logs
2024-12-08 21:26:22 -08:00
Benson Wong
da46545630
fix profile example in README
2024-12-01 10:13:31 -08:00
Benson Wong
50426935a4
.
2024-11-28 22:06:29 -08:00
Benson Wong
2fceb78e8d
Add examples
2024-11-28 22:05:41 -08:00
Benson Wong
716d37de82
Update README.md
...
fix grammar
2024-11-25 12:35:00 -08:00
Benson Wong
73ad85ea69
Implement Multi-Process Handling ( #7 )
...
Refactor code to support starting of multiple back end llama.cpp servers. This functionality is exposed as `profiles` to create a simple configuration format.
Changes:
* refactor proxy tests to get ready for multi-process support
* update proxy/ProxyManager to support multiple processes (#7 )
* Add support for Groups in configuration
* improve handling of Model alias configs
* implement multi-model swapping
* improve code clarity for swapModel
* improve docs, rename groups to profiles in config
2024-11-23 19:45:13 -08:00
Benson Wong
533162ce6a
add support for automatically unloading a model ( #10 ) ( #14 )
...
* Make starting upstream process on-demand (#10 )
* Add automatic unload of model after TTL is reached
* add `ttl` configuration parameter to models in seconds, default is 0 (never unload)
2024-11-19 16:32:51 -08:00
Benson Wong
a33ac6f8fb
update README
2024-11-18 15:37:50 -08:00
Benson Wong
a8e5ee13b9
Add logging with pipes example to README
2024-11-15 09:10:43 -08:00
Benson Wong
0f133f5b74
Add /logs endpoint to monitor upstream processes
...
- outputs last 10KB of logs from upstream processes
- supports streaming
2024-10-30 21:02:30 -07:00
Benson Wong
1510b3fbd9
clean up README
2024-10-22 10:37:45 -07:00
Benson Wong
0f8a8e70f1
add header image
2024-10-22 10:30:30 -07:00
Benson Wong
8eb5b7b6c4
Add custom check endpoint
...
Replace previously hardcoded value for `/health` to check when the
server became ready to serve traffic. With this the server can support
any server that provides an an OpenAI compatible inference endpoint.
2024-10-11 21:59:21 -07:00
Benson Wong
4fae7cf946
update docs
2024-10-04 21:11:08 -07:00
Benson Wong
cc944251df
update README
2024-10-04 20:43:48 -07:00
Benson Wong
d682589fb1
support environment variables
2024-10-04 11:55:27 -07:00
Benson Wong
43119e807f
add README
2024-10-04 11:37:51 -07:00