llama-swap

Author	SHA1	Message	Date
Benson Wong	52c0196e0f	clean up feature list in readme	2025-03-13 13:55:20 -07:00
Benson Wong	3201a68a04	Add /v1/audio/transcriptions support (#41 ) * add support for /v1/audio/transcriptions	2025-03-13 13:49:39 -07:00
Florin-Gabriel Dumitru	3ac94ad20e	Adds an endpoint '/running' (#61 ) * Adds an endpoint '/running' that returns either an empty JSON object if no model has been loaded so far, or the last model loaded (model key) and it's current state (state key). Possible state values are: stopped, starting, ready and stopping. * Improves the `/running` endpoint by allowing multiple entries under the `running` key within the JSON response. Refactors the `/running` method name (listRunningProcessesHandler). Removes the unlisted filter implementation. * Adds tests for: - no model loaded - one model loaded - multiple models loaded * Adds simple comments. * Simplified code structure as per 250313 comments on PR #65. --------- Co-authored-by: FGDumitru\|B <xelotx@gmail.com>	2025-03-13 13:42:59 -07:00
Benson Wong	62275e078d	add examples to restart on config change #59	2025-03-06 10:50:29 -08:00
Benson Wong	88916059e1	add /unload to docs	2025-03-03 10:44:16 -08:00
Benson Wong	af653347ae	Update README.md w/ starhistory graph	2025-02-27 16:43:34 -08:00
daschiller	7187cfe52e	add Windows build support to Makefile (#54 )	2025-02-18 17:24:31 -08:00
Benson Wong	24089d2d9c	remove "no musa container" note from README	2025-02-18 16:38:48 -08:00
Benson Wong	48bd766536	Update README.md	2025-02-14 22:05:52 -08:00
Benson Wong	8d319da4dd	improve README organization (i think...)	2025-02-14 15:59:12 -08:00
Benson Wong	be7c502448	improve docs	2025-02-14 15:47:31 -08:00
Benson Wong	96a8ea0241	add cpu docker container build	2025-02-14 15:25:45 -08:00
Benson Wong	f20f2c9b7a	add docs and container build improvements #43	2025-02-14 12:20:07 -08:00
Benson Wong	6667e307a2	Update README.md	2025-02-08 10:28:35 -08:00
Benson Wong	7ac446e6a9	Update README.md	2025-02-08 10:26:11 -08:00
Benson Wong	314d2f2212	remove cmd_stop configuration and functionality from PR #40 (#44 ) * remove cmd_stop functionality from #40	2025-01-31 12:42:44 -08:00
Benson Wong	baeb0c4e7f	Add cmd_stop configuration to better support docker (#35 ) Add `cmd_stop` to model configuration to run a command instead of sending a SIGTERM to shutdown a process before swapping.	2025-01-30 16:59:57 -08:00
Benson Wong	c3b834737f	Update README.md	2025-01-13 22:37:30 -08:00
Benson Wong	3c8e727b73	Update README.md	2025-01-12 19:48:35 -08:00
Benson Wong	3a1e9f81f1	support TTS /v1/audio/speech (#36 )	2025-01-12 16:27:01 -08:00
Benson Wong	72c883f36c	Update README.md	2025-01-02 09:01:51 -08:00
Benson Wong	1b04d034cf	Update README.md	2025-01-02 08:59:11 -08:00
Benson Wong	2e45f5692a	Update README.md Improve README documentation.	2025-01-01 12:51:24 -08:00
Benson Wong	c97b80bdfe	Update README.md	2025-01-01 12:25:45 -08:00
Benson Wong	84b667ca7a	improve logging and error reporting for troubleshooting	2024-12-20 10:46:56 -08:00
Benson Wong	29657106fc	add more OpenAI API supported in README	2024-12-20 10:08:20 -08:00
Benson Wong	9c8860471e	support v1/rerank endpoint	2024-12-17 21:22:25 -08:00
Benson Wong	7f45493a37	Update README.md	2024-12-17 14:45:41 -08:00
Benson Wong	891f6a5b5a	Add /upstream endpoint (#30 ) * remove catch-all route to upstream proxy (it was broken anyways) * add /upstream/:model_id to swap and route to upstream path * add /upstream HTML endpoint and unlisted option * add /upstream endpoint to show a list of available models * add `unlisted` configuration option to omit a model from /v1/models and /upstream lists * add favicon.ico	2024-12-17 14:37:44 -08:00
Benson Wong	e2443251ad	update readme	2024-12-09 19:14:49 -08:00
Benson Wong	97dae50dc4	update readme	2024-12-08 21:34:16 -08:00
Benson Wong	cb978f760f	add web interface to /logs	2024-12-08 21:26:22 -08:00
Benson Wong	da46545630	fix profile example in README	2024-12-01 10:13:31 -08:00
Benson Wong	50426935a4	.	2024-11-28 22:06:29 -08:00
Benson Wong	2fceb78e8d	Add examples	2024-11-28 22:05:41 -08:00
Benson Wong	716d37de82	Update README.md fix grammar	2024-11-25 12:35:00 -08:00
Benson Wong	73ad85ea69	Implement Multi-Process Handling (#7 ) Refactor code to support starting of multiple back end llama.cpp servers. This functionality is exposed as `profiles` to create a simple configuration format. Changes: * refactor proxy tests to get ready for multi-process support * update proxy/ProxyManager to support multiple processes (#7) * Add support for Groups in configuration * improve handling of Model alias configs * implement multi-model swapping * improve code clarity for swapModel * improve docs, rename groups to profiles in config	2024-11-23 19:45:13 -08:00
Benson Wong	533162ce6a	add support for automatically unloading a model (#10 ) (#14 ) * Make starting upstream process on-demand (#10) * Add automatic unload of model after TTL is reached * add `ttl` configuration parameter to models in seconds, default is 0 (never unload)	2024-11-19 16:32:51 -08:00
Benson Wong	a33ac6f8fb	update README	2024-11-18 15:37:50 -08:00
Benson Wong	a8e5ee13b9	Add logging with pipes example to README	2024-11-15 09:10:43 -08:00
Benson Wong	0f133f5b74	Add /logs endpoint to monitor upstream processes - outputs last 10KB of logs from upstream processes - supports streaming	2024-10-30 21:02:30 -07:00
Benson Wong	1510b3fbd9	clean up README	2024-10-22 10:37:45 -07:00
Benson Wong	0f8a8e70f1	add header image	2024-10-22 10:30:30 -07:00
Benson Wong	8eb5b7b6c4	Add custom check endpoint Replace previously hardcoded value for `/health` to check when the server became ready to serve traffic. With this the server can support any server that provides an an OpenAI compatible inference endpoint.	2024-10-11 21:59:21 -07:00
Benson Wong	4fae7cf946	update docs	2024-10-04 21:11:08 -07:00
Benson Wong	cc944251df	update README	2024-10-04 20:43:48 -07:00
Benson Wong	d682589fb1	support environment variables	2024-10-04 11:55:27 -07:00
Benson Wong	43119e807f	add README	2024-10-04 11:37:51 -07:00

48 Commits