Swapping models can take a long time and leave a lot of silence while the model is loading. Rather than silently load the model in the background, this PR allows llama-swap to send status updates in the reasoning_content of a streaming chat response.
Fixes: #366
Changes:
- add Metadata key to ModelConfig
- include metadata in /v1/models under meta.llamaswap key
- add recursive macro substitution into Metadata
- change macros at global and model level to be any scalar type
Note:
This is the first mostly AI generated change to llama-swap. See #333 for notes about the workflow and approach to AI going forward.
* proxy/config: add model level macros
Add macros to model configuration. Model macros override macros that are
defined at the global configuration level. They follow the same naming
and value rules as the global macros.
* proxy/config: fix bug with macro reserved name checking
The PORT reserved name was not properly checked
* proxy/config: add tests around model.filters.stripParams
- add check that model.filters.stripParams has no invalid macros
- renamed strip_params to stripParams for camel case consistency
- add legacy code compatibility so model.filters.strip_params continues to work
* proxy/config: add duplicate removal to model.filters.stripParams
* clean up some doc nits