Files
llama-swap/examples/README.md
2024-12-03 10:25:43 -08:00

571 B

Example Configs and Use Cases

A collections of usecases and examples for getting the most out of llama-swap.

  • Speculative Decoding - using a small draft model can increase inference speeds from 20% to 40%. This example includes a configurations Qwen2.5-Coder-32B (2.5x increase) and Llama-3.1-70B (1.4x increase) in the best cases.
  • Optimizing Code Generation - find the optimal settings for your machine. This example demonstrates defining multiple configurations and testing which one is fastest.