andreas/llama-swap

Files

T

History

Benson Wong e363f8f498 clean up writing with AI :b

2024-11-28 22:12:44 -08:00

..

speculative-decoding

clean up writing with AI :b

2024-11-28 22:12:44 -08:00

README.md

Add examples

2024-11-28 22:05:41 -08:00

README.md

Example Configurations

Learning by example is best.

Here in the examples/ folder are llama-swap configurations that can be used on your local LLM server.

List

Speculative Decoding - using a small draft model can increase inference speeds from 20% to 40%. This example includes a configurations Qwen2.5-Coder-32B (2.5x increase) and Llama-3.1-70B (1.4x increase) in the best cases.