Add examples

2024-11-28 22:05:15 -08:00
parent 9a81c53664
commit 2fceb78e8d
3 changed files with 14 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -72,6 +72,8 @@ profiles:
    - "llama"
 ```
 More complex [examples](examples/README.md) for different use cases.
 ## Installation
 1. Create a configuration file, see [config.example.yaml](config.example.yaml)
--- a/examples/README.md
+++ b/examples/README.md
@@ -0,0 +1,9 @@
 # Example Configurations
 Learning by example is best.
 Here in the `examples/` folder are llama-swap configurations that can be used on your local LLM server.
 ## List
 * [Speculative Decoding](speculative-decoding/README.md) - using a small draft model can increase inference speeds from 20% to 40%. This example includes a configurations Qwen2.5-Coder-32B (2.5x increase) and Llama-3.1-70B (1.4x increase) in the best cases.
--- a/examples/speculative-decoding/README.md
+++ b/examples/speculative-decoding/README.md
@@ -0,0 +1,3 @@
 # Qwen 2.5 Coder with a Draft Model
 Using a small draft model like qwen-2.5-coder-0.5B can have a big impact on the performance of the larger 32 billion parameter model.
		`@@ -0,0 +1,3 @@`
							`# Qwen 2.5 Coder with a Draft Model`

							`Using a small draft model like qwen-2.5-coder-0.5B can have a big impact on the performance of the larger 32 billion parameter model.`