LLM inference engine in C/C++ for running quantized models
LLM inference in C/C++
llama-clillama-serverllama-convertllama-quantize$ llama-cli -m model.gguf -p 'Hello, how are you?'$ llama-server -m model.gguf -ngl 33 --port 8000$ llama-quantize model.gguf model.q4_m.gguf Q4_M