Commit Graph

23 Commits

Author SHA1 Message Date
Benson Wong
29657106fc add more OpenAI API supported in README 2024-12-20 10:08:20 -08:00
Benson Wong
9c8860471e support v1/rerank endpoint 2024-12-17 21:22:25 -08:00
Benson Wong
7f45493a37 Update README.md 2024-12-17 14:45:41 -08:00
Benson Wong
891f6a5b5a Add /upstream endpoint (#30)
* remove catch-all route to upstream proxy (it was broken anyways)
* add /upstream/:model_id to swap and route to upstream path
* add /upstream HTML endpoint and unlisted option
* add /upstream endpoint to show a list of available models
* add `unlisted` configuration option to omit a model from /v1/models and /upstream lists
* add favicon.ico
2024-12-17 14:37:44 -08:00
Benson Wong
e2443251ad update readme 2024-12-09 19:14:49 -08:00
Benson Wong
97dae50dc4 update readme 2024-12-08 21:34:16 -08:00
Benson Wong
cb978f760f add web interface to /logs 2024-12-08 21:26:22 -08:00
Benson Wong
da46545630 fix profile example in README 2024-12-01 10:13:31 -08:00
Benson Wong
50426935a4 . 2024-11-28 22:06:29 -08:00
Benson Wong
2fceb78e8d Add examples 2024-11-28 22:05:41 -08:00
Benson Wong
716d37de82 Update README.md
fix grammar
2024-11-25 12:35:00 -08:00
Benson Wong
73ad85ea69 Implement Multi-Process Handling (#7)
Refactor code to support starting of multiple back end llama.cpp servers. This functionality is exposed as `profiles` to create a simple configuration format. 

Changes: 

* refactor proxy tests to get ready for multi-process support
* update proxy/ProxyManager to support multiple processes (#7)
* Add support for Groups in configuration
* improve handling of Model alias configs
* implement multi-model swapping
* improve code clarity for swapModel
* improve docs, rename groups to profiles in config
2024-11-23 19:45:13 -08:00
Benson Wong
533162ce6a add support for automatically unloading a model (#10) (#14)
* Make starting upstream process on-demand (#10)
* Add automatic unload of model after TTL is reached
* add `ttl` configuration parameter to models in seconds, default is 0 (never unload)
2024-11-19 16:32:51 -08:00
Benson Wong
a33ac6f8fb update README 2024-11-18 15:37:50 -08:00
Benson Wong
a8e5ee13b9 Add logging with pipes example to README 2024-11-15 09:10:43 -08:00
Benson Wong
0f133f5b74 Add /logs endpoint to monitor upstream processes
- outputs last 10KB of logs from upstream processes
- supports streaming
2024-10-30 21:02:30 -07:00
Benson Wong
1510b3fbd9 clean up README 2024-10-22 10:37:45 -07:00
Benson Wong
0f8a8e70f1 add header image 2024-10-22 10:30:30 -07:00
Benson Wong
8eb5b7b6c4 Add custom check endpoint
Replace previously hardcoded value for `/health` to check when the
server became ready to serve traffic. With this the server can support
any server that provides an an OpenAI compatible inference endpoint.
2024-10-11 21:59:21 -07:00
Benson Wong
4fae7cf946 update docs 2024-10-04 21:11:08 -07:00
Benson Wong
cc944251df update README 2024-10-04 20:43:48 -07:00
Benson Wong
d682589fb1 support environment variables 2024-10-04 11:55:27 -07:00
Benson Wong
43119e807f add README 2024-10-04 11:37:51 -07:00