Add custom check endpoint

Replace previously hardcoded value for `/health` to check when the server became ready to serve traffic. With this the server can support any server that provides an an OpenAI compatible inference endpoint.
2024-10-11 21:59:21 -07:00
parent 5a57688aa8
commit 8eb5b7b6c4
5 changed files with 40 additions and 11 deletions
--- a/README.md
+++ b/README.md
@@ -30,6 +30,13 @@ models:
    - "gpt-4o-mini"
    - "gpt-3.5-turbo"

+    # wait for this path to return an HTTP 200 before serving requests
+    # defaults to /health to match llama.cpp
+    #
+    # use "none" to skip endpoint checking. This may cause requests to fail
+    # until the server is ready
+    checkEndpoint: "/custom-endpoint"
+
  "qwen":
    # environment variables to pass to the command
    env: