add example: optimizing code generation

This commit is contained in:
Benson Wong
2024-12-03 10:25:43 -08:00
parent da46545630
commit da2326bdc7
3 changed files with 169 additions and 6 deletions

View File

@@ -1,9 +1,6 @@
# Example Configurations
# Example Configs and Use Cases
Learning by example is best.
Here in the `examples/` folder are llama-swap configurations that can be used on your local LLM server.
## List
A collections of usecases and examples for getting the most out of llama-swap.
* [Speculative Decoding](speculative-decoding/README.md) - using a small draft model can increase inference speeds from 20% to 40%. This example includes a configurations Qwen2.5-Coder-32B (2.5x increase) and Llama-3.1-70B (1.4x increase) in the best cases.
* [Optimizing Code Generation](benchmark-snakegame/README.md) - find the optimal settings for your machine. This example demonstrates defining multiple configurations and testing which one is fastest.