At INSAIT we are thrilled to launch BgGPT-7B-Instruct-v0.1, the first free and open Bulgarian Large Language Model in the BgGPT series (more models coming soon). BgGPT-7B-Instruct-v0.1 is now available for download at HuggingFace with the permissive and commercial-friendly Apache 2.0 licence. The model, which builds on Mistral-7B, already outperforms similarly sized models such as LLaMA2-7b and Mistral-7B on all Bulgarian language tasks. On many of these tasks, It also outperforms much larger models such as Mixtral-8x7B-Instruct-v0.1 (about 6.5 times larger), which has been shown to have similar capabilities as GPT-3.5.
To systematically evaluate the Bulgarian performance of LLMs, including our model and any existing or future models, we translated a set of benchmarks to Bulgarian, including:
These benchmarks (except the last one which already exists) were built via both machine translation as well as our amazing team of translators. For evaluation, we forked a version of the EuletherAI's evaluation harness. All benchmark data is made publicly available in our HF repository to help others evaluate their own models.