Add examples
This commit is contained in:
@@ -72,6 +72,8 @@ profiles:
|
|||||||
- "llama"
|
- "llama"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
More complex [examples](examples/README.md) for different use cases.
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
1. Create a configuration file, see [config.example.yaml](config.example.yaml)
|
1. Create a configuration file, see [config.example.yaml](config.example.yaml)
|
||||||
|
|||||||
9
examples/README.md
Normal file
9
examples/README.md
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
# Example Configurations
|
||||||
|
|
||||||
|
Learning by example is best.
|
||||||
|
|
||||||
|
Here in the `examples/` folder are llama-swap configurations that can be used on your local LLM server.
|
||||||
|
|
||||||
|
## List
|
||||||
|
|
||||||
|
* [Speculative Decoding](speculative-decoding/README.md) - using a small draft model can increase inference speeds from 20% to 40%. This example includes a configurations Qwen2.5-Coder-32B (2.5x increase) and Llama-3.1-70B (1.4x increase) in the best cases.
|
||||||
3
examples/speculative-decoding/README.md
Normal file
3
examples/speculative-decoding/README.md
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
# Qwen 2.5 Coder with a Draft Model
|
||||||
|
|
||||||
|
Using a small draft model like qwen-2.5-coder-0.5B can have a big impact on the performance of the larger 32 billion parameter model.
|
||||||
Reference in New Issue
Block a user