From 43119e807f0772df6416465c28b27ec2ff6bea73 Mon Sep 17 00:00:00 2001
From: Benson Wong <mostlygeek@gmail.com>
Date: Fri, 4 Oct 2024 11:37:51 -0700
Subject: [PATCH] add README

---
 README.md | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..1e63f59
--- /dev/null
+++ b/README.md
@@ -0,0 +1,47 @@
+# LLaMAGate
+
+A golang gateway that automatically manages [llama-server](https://github.com/ggerganov/llama.cpp/tree/master/examples/server) to serve the requested `model` in the HTTP request. Serve all the models you have downloaded without manually swapping between them.
+
+Created because I wanted:
+
+- ✅ easy to deploy: single binary with no dependencies
+- ✅ full control over llama-server's startup settings
+- ✅ ❤️ for Nvidia P40 users who are rely on llama.cpp row split mode for large models
+
+## YAML Configuration
+
+```yaml
+# Seconds to wait for llama.cpp to be available to serve requests
+# Default (and minimum): 15 seconds
+healthCheckTimeout: 60
+
+# define models
+models:
+  "llama":
+    cmd: "llama-server --port 8999 -m Llama-3.2-1B-Instruct-Q4_K_M.gguf"
+
+    # address where llama-ser
+    proxy: "http://127.0.0.1:8999"
+
+    # list of aliases this llama.cpp instance can also serve
+    aliases:
+    - "gpt-4o-mini"
+    - "gpt-3.5-turbo"
+
+  "qwen":
+    cmd: "llama-server --port 8999 -m path/to/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf"
+    proxy: "http://127.0.0.1:8999"
+    aliases:
+```
+
+## Testing with CURL
+
+```bash
+> curl http://localhost:8080/v1/chat/completions -N -d '{"messages":[{"role":"user","content":"write a 3 word story"}], "model":"llama"}'| jq -c '.choices[].message.content'
+
+# will reuse the llama-server instance
+> curl http://localhost:8080/v1/chat/completions -N -d '{"messages":[{"role":"user","content":"write a 3 word story"}], "model":"gpt-4o-mini"}'| jq -c '.choices[].message.content'
+
+# swap to Qwen2.5-1.5B-Instruct-Q4_K_M.gguf
+> curl http://localhost:8080/v1/chat/completions -N -d '{"messages":[{"role":"user","content":"write a 3 word story"}], "model":"qwen"}'| jq -c '.choices[].message.content'
+```
\ No newline at end of file