The model is packaged into executable weights, which we call llamafiles. This makes it easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBS

jartine / gemma-2-27b-it-llamafile like 1

submited by
Style Pass
2024-07-02 16:00:33

The model is packaged into executable weights, which we call llamafiles. This makes it easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and NetBSD for AMD64 and ARM64.

The llamafile software is open source and permissively licensed. However the weights embedded inside the llamafiles are governed by Google's Gemma License and Gemma Prohibited Use Policy. This is not an open source license. It's about as restrictive as it gets. There's a great many things you're not allowed to do with Gemma. The terms of the license and its list of unacceptable uses can be changed by Google at any time. Therefore we wouldn't recommend using these llamafiles for anything other than evaluating the quality of Google's engineering.

This model has a max context window size of 8k tokens. By default, a context window size of 512 tokens is used. You may increase this to the maximum by passing the -c 0 flag.

On GPUs with sufficient RAM, the -ngl 999 flag may be passed to use the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card driver needs to be installed. If the prebuilt DSOs should fail, the CUDA or ROCm SDKs may need to be installed, in which case llamafile builds a native module just for your system.

Leave a Comment
Related Posts