Mistral.rs is a fast LLM inference platform supporting inference on a variety of devices, quantization, and easy-to-use application with an Open-AI AP

Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2024-04-25 23:00:08

Mistral.rs is a fast LLM inference platform supporting inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

Enabling features is done by passing --features ... to the build system. When using cargo run or maturin develop, pass the --features flag before the -- separating build flags from runtime flags.

To install mistral.rs, one should ensure they have Rust installed by following this link. Additionally, the Huggingface token should be provided in ~/.cache/huggingface/token when using the server to enable automatic download of gated models.

Set HF token correctly (skip if already set or your model is not gated, or if you want to use the token_source parameters in Python or the command line.)

The build process will output a binary misralrs-server at ./target/release/mistralrs-server which may be copied into the working directory with the following command: