llama-swap

Author	SHA1	Message	Date
Ikko Eltociear Ashimine	9a81c53664	chore: update process_test.go (#17 ) nonexistant -> nonexistent	2024-11-26 10:20:16 -08:00
Benson Wong	716d37de82	Update README.md fix grammar	2024-11-25 12:35:00 -08:00
Benson Wong	73ad85ea69	Implement Multi-Process Handling (#7 ) Refactor code to support starting of multiple back end llama.cpp servers. This functionality is exposed as `profiles` to create a simple configuration format. Changes: * refactor proxy tests to get ready for multi-process support * update proxy/ProxyManager to support multiple processes (#7) * Add support for Groups in configuration * improve handling of Model alias configs * implement multi-model swapping * improve code clarity for swapModel * improve docs, rename groups to profiles in config	2024-11-23 19:45:13 -08:00
Benson Wong	533162ce6a	add support for automatically unloading a model (#10 ) (#14 ) * Make starting upstream process on-demand (#10) * Add automatic unload of model after TTL is reached * add `ttl` configuration parameter to models in seconds, default is 0 (never unload)	2024-11-19 16:32:51 -08:00
Benson Wong	ba39ed4c18	Add support for legacy v1/completions API (#12 )	2024-11-19 09:57:39 -08:00
Benson Wong	21f54f96c2	Merge pull request #13 from mostlygeek/set-content-length Dechunk HTTP requests by default (#11)	2024-11-19 09:46:03 -08:00
Benson Wong	7eec51f3f2	Dechunk HTTP requests by default (#11 ) ProxyManager already has all the Request body's data. There is no never a need to use chunked transfer encoding to the upstream process.	2024-11-19 09:40:44 -08:00
Benson Wong	5021e0f299	remove the process handler override	2024-11-18 21:26:39 -08:00
Benson Wong	c9233d2c9a	use gin instead of standard http lib in main	2024-11-18 15:58:28 -08:00
Benson Wong	a33ac6f8fb	update README	2024-11-18 15:37:50 -08:00
Benson Wong	401aa88949	move log handlers to separate file	2024-11-18 15:33:06 -08:00
Benson Wong	e9e88fd229	rename proxy.go to proxymanager.go	2024-11-18 15:30:34 -08:00
Benson Wong	c3b4bb1684	use gin for http server	2024-11-18 15:30:16 -08:00
Benson Wong	e5c909ddf7	add tests for proxy.Process	2024-11-17 20:49:14 -08:00
Benson Wong	36a31f450f	add proxy.Process to manage upstream proxy logic	2024-11-17 16:41:15 -08:00
Benson Wong	a8e5ee13b9	Add logging with pipes example to README	2024-11-15 09:10:43 -08:00
Benson Wong	5944a86e86	fix early timeout bug	2024-11-09 20:08:40 -08:00
Benson Wong	63d4a7d0eb	Improve LogMonitor to handle empty writes and ensure buffer immutability - Add a check to return immediately if the write buffer is empty - Create a copy of new history data to ensure it is immutable - Update the `GetHistory` method to use the `any` type for the buffer interface - Add a test case to verify that the buffer remains unchanged even if the original message is modified after writing	2024-11-02 10:41:23 -07:00
Benson Wong	f45469f7ff	Merge pull request #8 from mostlygeek/improve-upstream-monitoring-issue5 Improvements to handling of the upstream process so errors happen whenever one of these is first: the health check timeout is reached waiting for the upstream process to be ready the upstream process exits unexpectedly With this change llama-swap is more compatible with use cases like containerized upstream services (#5) which pull the container before HTTP endpoints are ready.	2024-11-01 15:28:06 -07:00
Benson Wong	34f9fd7340	Improve timeout and exit handling of child processes. fix #3 and #5 llama-swap only waited a maximum of 5 seconds for an upstream HTTP server to be available. If it took longer than that it will error out the request. Now it will wait up to the configured healthCheckTimeout or the upstream process unexpectedly exits.	2024-11-01 14:32:39 -07:00
Benson Wong	8448efa7fc	revise health check logic to not error on 5 second timeout	2024-11-01 09:42:37 -07:00
Benson Wong	8cf2a389d8	Refactor log implementation - use []byte instead of unnecessary string conversions - make LogManager.Broadcast private - make LogManager.GetHistory public - add tests	2024-10-31 12:16:54 -07:00
Benson Wong	0f133f5b74	Add /logs endpoint to monitor upstream processes - outputs last 10KB of logs from upstream processes - supports streaming	2024-10-30 21:02:30 -07:00
Benson Wong	1510b3fbd9	clean up README	2024-10-22 10:37:45 -07:00
Benson Wong	0f8a8e70f1	add header image	2024-10-22 10:30:30 -07:00
Benson Wong	6c3819022c	Add compatibility with OpenAI /v1/models endpoint to list models	2024-10-21 15:38:12 -07:00
Benson Wong	8580f0f733	Merge pull request #6 from mostlygeek/multiline-config Support multiline cmds in YAML configuration	2024-10-19 20:07:36 -07:00
Benson Wong	be82d1a6a0	Support multiline cmds in YAML configuration Add support for multiline `cmd` configurations allowing for nicer looking configuration YAML files.	2024-10-19 20:06:59 -07:00
Benson Wong	6cf0962807	Add custom check endpoint Replace previously hardcoded value for /health to check when the server became ready to serve traffic. With this the server can support any server that provides an an OpenAI compatible inference endpoint.	2024-10-11 22:02:14 -07:00
Benson Wong	8eb5b7b6c4	Add custom check endpoint Replace previously hardcoded value for `/health` to check when the server became ready to serve traffic. With this the server can support any server that provides an an OpenAI compatible inference endpoint.	2024-10-11 21:59:21 -07:00
Benson Wong	5a57688aa8	add .vscode to .gitignore	2024-10-05 19:37:00 -07:00
Benson Wong	b79b7ef3d9	add goreleaser config to limit GOOS and GOARCH builds	2024-10-04 21:46:55 -07:00
Benson Wong	476086c066	Add Cmd.Wait() to prevent creation of zombie child processes see: #1	2024-10-04 21:38:29 -07:00
Benson Wong	4fae7cf946	update docs	2024-10-04 21:11:08 -07:00
Benson Wong	cc944251df	update README	2024-10-04 20:43:48 -07:00
Benson Wong	ef05c05f9c	renaming to llama-swap	2024-10-04 20:21:11 -07:00
Benson Wong	ef8d0020f3	release works?	2024-10-04 12:54:14 -07:00
Benson Wong	5a4a41c015	add release thing	2024-10-04 12:53:10 -07:00
Benson Wong	f992f7f52f	Create go.yml	2024-10-04 12:45:13 -07:00
Benson Wong	85743ad914	remove the v1/models endpoint, needs improvement	2024-10-04 12:33:41 -07:00
Benson Wong	3e90f8328d	add /v1/models endpoint and proxy everything to llama-server	2024-10-04 12:28:50 -07:00
Benson Wong	e0103d1884	build simple-responder with make all	2024-10-04 12:14:10 -07:00
Benson Wong	d682589fb1	support environment variables	2024-10-04 11:55:27 -07:00
Benson Wong	43119e807f	add README	2024-10-04 11:37:51 -07:00
Benson Wong	844615bfcc	rename to llamagate	2024-10-04 11:09:36 -07:00
Benson Wong	aaca9d889b	add Makefile	2024-10-04 11:07:00 -07:00
Benson Wong	bfdba43bd8	improve error handling	2024-10-04 10:55:02 -07:00
Benson Wong	2d387cf373	rename proxy.go to manager.go	2024-10-04 09:39:10 -07:00
Benson Wong	d061819fb1	moved config into proxy package	2024-10-04 09:38:30 -07:00
Benson Wong	7475bf0fff	.	2024-10-04 09:31:08 -07:00

1 2

55 Commits