93 Commits

Author SHA1 Message Date
Benson Wong
af653347ae Update README.md w/ starhistory graph 2025-02-27 16:43:34 -08:00
daschiller
7187cfe52e add Windows build support to Makefile (#54) 2025-02-18 17:24:31 -08:00
Benson Wong
24089d2d9c remove "no musa container" note from README 2025-02-18 16:38:48 -08:00
Benson Wong
48bd766536 Update README.md 2025-02-14 22:05:52 -08:00
Benson Wong
8d319da4dd improve README organization (i think...) 2025-02-14 15:59:12 -08:00
Benson Wong
be7c502448 improve docs 2025-02-14 15:47:31 -08:00
Benson Wong
96a8ea0241 add cpu docker container build 2025-02-14 15:25:45 -08:00
Benson Wong
f20f2c9b7a add docs and container build improvements #43 2025-02-14 12:20:07 -08:00
Benson Wong
6667e307a2 Update README.md 2025-02-08 10:28:35 -08:00
Benson Wong
7ac446e6a9 Update README.md 2025-02-08 10:26:11 -08:00
Benson Wong
314d2f2212 remove cmd_stop configuration and functionality from PR #40 (#44)
* remove cmd_stop functionality from #40
2025-01-31 12:42:44 -08:00
Benson Wong
baeb0c4e7f Add cmd_stop configuration to better support docker (#35)
Add `cmd_stop` to model configuration to run a command instead of sending a SIGTERM to shutdown a process before swapping.
2025-01-30 16:59:57 -08:00
Benson Wong
c3b834737f Update README.md 2025-01-13 22:37:30 -08:00
Benson Wong
3c8e727b73 Update README.md 2025-01-12 19:48:35 -08:00
Benson Wong
3a1e9f81f1 support TTS /v1/audio/speech (#36) 2025-01-12 16:27:01 -08:00
Benson Wong
72c883f36c Update README.md 2025-01-02 09:01:51 -08:00
Benson Wong
1b04d034cf Update README.md 2025-01-02 08:59:11 -08:00
Benson Wong
2e45f5692a Update README.md
Improve README documentation.
2025-01-01 12:51:24 -08:00
Benson Wong
c97b80bdfe Update README.md 2025-01-01 12:25:45 -08:00
Benson Wong
84b667ca7a improve logging and error reporting for troubleshooting 2024-12-20 10:46:56 -08:00
Benson Wong
29657106fc add more OpenAI API supported in README 2024-12-20 10:08:20 -08:00
Benson Wong
9c8860471e support v1/rerank endpoint 2024-12-17 21:22:25 -08:00
Benson Wong
7f45493a37 Update README.md 2024-12-17 14:45:41 -08:00
Benson Wong
891f6a5b5a Add /upstream endpoint (#30)
* remove catch-all route to upstream proxy (it was broken anyways)
* add /upstream/:model_id to swap and route to upstream path
* add /upstream HTML endpoint and unlisted option
* add /upstream endpoint to show a list of available models
* add `unlisted` configuration option to omit a model from /v1/models and /upstream lists
* add favicon.ico
2024-12-17 14:37:44 -08:00
Benson Wong
e2443251ad update readme 2024-12-09 19:14:49 -08:00
Benson Wong
97dae50dc4 update readme 2024-12-08 21:34:16 -08:00
Benson Wong
cb978f760f add web interface to /logs 2024-12-08 21:26:22 -08:00
Benson Wong
da46545630 fix profile example in README 2024-12-01 10:13:31 -08:00
Benson Wong
50426935a4 . 2024-11-28 22:06:29 -08:00
Benson Wong
2fceb78e8d Add examples 2024-11-28 22:05:41 -08:00
Benson Wong
716d37de82 Update README.md
fix grammar
2024-11-25 12:35:00 -08:00
Benson Wong
73ad85ea69 Implement Multi-Process Handling (#7)
Refactor code to support starting of multiple back end llama.cpp servers. This functionality is exposed as `profiles` to create a simple configuration format. 

Changes: 

* refactor proxy tests to get ready for multi-process support
* update proxy/ProxyManager to support multiple processes (#7)
* Add support for Groups in configuration
* improve handling of Model alias configs
* implement multi-model swapping
* improve code clarity for swapModel
* improve docs, rename groups to profiles in config
2024-11-23 19:45:13 -08:00
Benson Wong
533162ce6a add support for automatically unloading a model (#10) (#14)
* Make starting upstream process on-demand (#10)
* Add automatic unload of model after TTL is reached
* add `ttl` configuration parameter to models in seconds, default is 0 (never unload)
2024-11-19 16:32:51 -08:00
Benson Wong
a33ac6f8fb update README 2024-11-18 15:37:50 -08:00
Benson Wong
a8e5ee13b9 Add logging with pipes example to README 2024-11-15 09:10:43 -08:00
Benson Wong
0f133f5b74 Add /logs endpoint to monitor upstream processes
- outputs last 10KB of logs from upstream processes
- supports streaming
2024-10-30 21:02:30 -07:00
Benson Wong
1510b3fbd9 clean up README 2024-10-22 10:37:45 -07:00
Benson Wong
0f8a8e70f1 add header image 2024-10-22 10:30:30 -07:00
Benson Wong
8eb5b7b6c4 Add custom check endpoint
Replace previously hardcoded value for `/health` to check when the
server became ready to serve traffic. With this the server can support
any server that provides an an OpenAI compatible inference endpoint.
2024-10-11 21:59:21 -07:00
Benson Wong
4fae7cf946 update docs 2024-10-04 21:11:08 -07:00
Benson Wong
cc944251df update README 2024-10-04 20:43:48 -07:00
Benson Wong
d682589fb1 support environment variables 2024-10-04 11:55:27 -07:00
Benson Wong
43119e807f add README 2024-10-04 11:37:51 -07:00