Local inference server with OpenAI-compatible GGUF endpoints
Small local inference server with OpenAI-compatible GGUF endpoints
shimmy
$ shimmy
$ shimmy --port 8080
$ shimmy --model /path/to/model.gguf