llama-swap

Author	SHA1	Message	Date
Benson Wong	7f37bcc6eb	Improve testing around using SIGKILL (#127 ) * Add test for SIGKILL of process * silent TestProxyManager_RunningEndpoint debug output * Ref #125	2025-05-13 21:21:52 -07:00
Benson Wong	519c3a4d22	Change /unload to not wait for inflight requests (#125 ) Sometimes upstreams can accept HTTP but never respond causing requests to build up waiting for a response. This can block Process.Stop() as that waits for inflight requests to finish. This change refactors the code to not wait when attempting to shutdown the process.	2025-05-13 11:39:19 -07:00
Benson Wong	9dc4bcb46c	Add a concurrency limit to Process.ProxyRequest (#123 )	2025-05-12 18:12:52 -07:00
Benson Wong	06eda7f591	tag all process logs with its ID (#103 ) Makes identifying Process of log messages easier	2025-04-25 12:58:25 -07:00
Benson Wong	5fad24c16f	Make checkHealthTimeout Interruptable during startup (#102 ) interrupt and exit Process.start() early if the upstream process exits prematurely or unexpectedly.	2025-04-24 14:39:33 -07:00
Benson Wong	b8f888f864	Logging Improvements (#88 ) This change revamps the internal logging architecture to be more flexible and descriptive. Previously all logs from both llama-swap and upstream services were mixed together. This makes it harder to troubleshoot and identify problems. This PR adds these new endpoints: - `/logs/stream/proxy` - just llama-swap's logs - `/logs/stream/upstream` - stdout output from the upstream server	2025-04-04 21:01:33 -07:00
Benson Wong	d625ab8d92	Refactor process state management (#70 ) (#73 ) * add isValidStateTransition helper function * Replace Process.setState() with Process.swapState() * Refactor locking logic in Process	2025-03-15 17:14:03 -07:00
Benson Wong	9b2ed244e2	Improve Continuous integration and fix concurrency bugs (#66 ) - improvements to the continuous GH actions - fix edge case concurrency bugs with Process.start() and state transitions discovered setting up CI.	2025-03-11 10:39:14 -07:00
Benson Wong	09bdd86b54	Improve shutdown behaviour (#47 ) (#49 ) Introduce `Process.Shutdown()` and `ProxyManager.Shutdown()`. These two function required a lot of internal process state management refactoring. A key benefit is that `Process.start()` is now interruptable. When `Shutdown()` is called it will break the long health check loop. State management within Process is also improved. Added `starting`, `stopping` and `shutdown` states. Additionally, introduced a simple finite state machine to manage transitions.	2025-02-05 17:19:59 -08:00
Benson Wong	2833517eef	Improve handling of process that do not handle SIGTERM (#38 ) - Process TTL goroutine did not have a return after .Stop() - Improve logging - Add test TestProcess_LowTTLValue to measure SIGTERM error rate	2025-01-20 14:39:52 -08:00
Benson Wong	5fbd53c616	delay TTL check until after all requests are complete (#25 ) - fixes #25 where requests that last longer than the TTL will cause the process to be unloaded before the next request. - new behavior, TTL waits until all requests are complete before checking timeout	2024-12-09 19:08:03 -08:00
Benson Wong	da46545630	fix profile example in README	2024-12-01 10:13:31 -08:00
Benson Wong	cf82b3c633	Improve Concurrency and Parallel Request Handling (#19 ) Rewrite the swap behaviour so that in-flight requests block process swapping until they are completed. Additionally: - add tests for parallel requests with proxy.ProxyManager and proxy.Process - improve Process startup behaviour and simplified the code - stopping of processes are sent SIGTERM and have 5 seconds to terminate, before they are killed	2024-11-30 15:24:42 -08:00
Ikko Eltociear Ashimine	9a81c53664	chore: update process_test.go (#17 ) nonexistant -> nonexistent	2024-11-26 10:20:16 -08:00
Benson Wong	73ad85ea69	Implement Multi-Process Handling (#7 ) Refactor code to support starting of multiple back end llama.cpp servers. This functionality is exposed as `profiles` to create a simple configuration format. Changes: * refactor proxy tests to get ready for multi-process support * update proxy/ProxyManager to support multiple processes (#7) * Add support for Groups in configuration * improve handling of Model alias configs * implement multi-model swapping * improve code clarity for swapModel * improve docs, rename groups to profiles in config	2024-11-23 19:45:13 -08:00
Benson Wong	533162ce6a	add support for automatically unloading a model (#10 ) (#14 ) * Make starting upstream process on-demand (#10) * Add automatic unload of model after TTL is reached * add `ttl` configuration parameter to models in seconds, default is 0 (never unload)	2024-11-19 16:32:51 -08:00
Benson Wong	a33ac6f8fb	update README	2024-11-18 15:37:50 -08:00
Benson Wong	e5c909ddf7	add tests for proxy.Process	2024-11-17 20:49:14 -08:00

18 Commits