It tries to close the gap between pure inference engine (such as ExLlamaV2 and Llama.cpp) and additional needs for agentic work (e.g., function calling, formatting constraints).
As there is no standard approach for these experimental feature current, the library does implement it via some boilerplate code (you will find some fixed prompt inside the code) and thus is an opinionated API engine.
This library is marked with rolling update so please be patient hi-cup and bugs :) (feel free to open an issue if you come across any).
Head down to the installation guide at the bottom of this page. Then check out the Examples_Notebook.ipynb in the examples folder A simple python streamlit frontend chat UI code is included in the examples folder streamlit Or checkout GallamaUI
For Pixtral, please install Exllama V2 v0.2.4 onwards For Exllama V2, please install dev branch of Exllama V2 as the code is not yet merged to v0.2.4.