By passing in the socket from systemd we minimize resource use when not in use. Since no other network access is required for operation, we can configure the container with network=none and minimize the risk of the AI escaping.

Set up

Optional, if you want to run this as a separate user

sudo useradd -m llama
sudo machinectl shell llama@

Check out this repository, navigate to its root directory and build the llama.cpp/llama swap container with

podman build -t localhost/lamaswap:latest -f Build.Containerfile

Place llama.socket in ~/.config/systemd/user, adjust ports and interfaces if needed. Place llama.container in ~/.config/containers/systemd. Adjust paths for models and config if desired. The files are in docs/socket_activation, next to this readme.

Put model files into the models directory (~/models). Create a llama swap config.yaml (by default in ~) according to the docs.

Start the socket:

systemctl --user daemon-reload
systemctl --user enable --now llama.socket

If you want to run the service also when the user is not logged in, enable lingering:

sudo loginctl enable-linger <user>

Check that you can access the llama swap control panel in browser. For troubleshooting, use, e. g., journalctl -xe.