2024-10-04 12:45:13 -07:00
2024-10-04 12:45:13 -07:00
2024-10-03 20:20:01 -07:00
2024-10-03 20:20:01 -07:00
2024-10-04 11:09:36 -07:00
2024-10-03 20:20:01 -07:00
.
2024-10-04 09:31:08 -07:00
2024-10-04 11:09:36 -07:00
2024-10-04 12:14:10 -07:00
2024-10-04 11:55:27 -07:00

LLaMAGate

A golang gateway that automatically manages llama-server to serve the requested model in the HTTP request. Serve all the models you have downloaded without manually swapping between them.

Created because I wanted:

  • easy to deploy: single binary with no dependencies
  • full control over llama-server's startup settings
  • ❤️ for Nvidia P40 users who are rely on llama.cpp row split mode for large models

YAML Configuration

# Seconds to wait for llama.cpp to be available to serve requests
# Default (and minimum): 15 seconds
healthCheckTimeout: 60

# define models
models:
  "llama":
    env:
      - "CUDA_VISIBLE_DEVICES=0"

    cmd: "llama-server --port 8999 -m Llama-3.2-1B-Instruct-Q4_K_M.gguf"

    # address where llama-ser
    proxy: "http://127.0.0.1:8999"

    # list of aliases this llama.cpp instance can also serve
    aliases:
    - "gpt-4o-mini"
    - "gpt-3.5-turbo"

  "qwen":
    cmd: "llama-server --port 8999 -m path/to/Qwen2.5-1.5B-Instruct-Q4_K_M.gguf"
    proxy: "http://127.0.0.1:8999"

Testing with CURL

> curl http://localhost:8080/v1/chat/completions -N -d '{"messages":[{"role":"user","content":"write a 3 word story"}], "model":"llama"}'| jq -c '.choices[].message.content'

# will reuse the llama-server instance
> curl http://localhost:8080/v1/chat/completions -N -d '{"messages":[{"role":"user","content":"write a 3 word story"}], "model":"gpt-4o-mini"}'| jq -c '.choices[].message.content'

# swap to Qwen2.5-1.5B-Instruct-Q4_K_M.gguf
> curl http://localhost:8080/v1/chat/completions -N -d '{"messages":[{"role":"user","content":"write a 3 word story"}], "model":"qwen"}'| jq -c '.choices[].message.content'
Description
No description provided
Readme 2.3 MiB
Languages
Go 79.6%
TypeScript 13.6%
Shell 2.9%
CSS 1.3%
Makefile 1.1%
Other 1.5%