support v1/rerank endpoint

2024-12-17 21:22:25 -08:00
parent 9b4e3f307e
commit 9c8860471e
5 changed files with 42 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -92,7 +92,11 @@ profiles:
    - "llama"
 ```

-More [examples](examples/README.md) are available for different use cases.
+**Guides and examples**
+
+- [config.example.yaml](config.example.yaml) includes example for supporting `v1/embeddings` and `v1/rerank` endpoints
+- [Speculative Decoding](examples/speculative-decoding/README.md) - using a small draft model can increase inference speeds from 20% to 40%. This example includes a configurations Qwen2.5-Coder-32B (2.5x increase) and Llama-3.1-70B (1.4x increase) in the best cases.
+- [Optimizing Code Generation](examples/benchmark-snakegame/README.md) - find the optimal settings for your machine. This example demonstrates defining multiple configurations and testing which one is fastest.

 ## Installation