The bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers, matrix multiplication (LLM.int8()), and quantization functions.
In some cases it can happen that you need to compile from source. If this happens please consider submitting a bug report with python -m bitsandbytes information. What now follows is some short instructions which might work out of the box if nvcc is installed. If these do not work see further below.
The requirements can best be fulfilled by installing pytorch via anaconda. You can install PyTorch by following the "Get Started" instructions on the official website.
For straight Int8 matrix multiplication with mixed precision decomposition you can use bnb.matmul(...). To enable mixed precision decomposition, use the threshold parameter:
For instructions how to use LLM.int8() inference layers in your own code, see the TL;DR above or for extended instruction see this blog post.