Recently we have witnessed a burst of large-scale models with over 100 billion parameters in the opensource community. These models have demonstrated

Qwen1.5-110B: The First 100B+ Model of the Qwen1.5 Series

submited by

Style Pass

2024-04-26 12:30:05

Recently we have witnessed a burst of large-scale models with over 100 billion parameters in the opensource community. These models have demonstrated remarkable performance in both benchmark evaluation and chatbot arena. Today, we release the first 100B+ model of the Qwen1.5 series, Qwen1.5-110B, which achieves comparable performance with Meta-Llama3-70B in the base model evaluation, and outstanding performance in the chat evaluation, including MT-Bench and AlpacaEval 2.0.

Qwen1.5-110B is similar to other Qwen1.5 models and built with the same Transformer decoder architecture. It consists of grouped query attention (GQA) and it can be efficient in model serving. The model supports the context length 32K tokens, and the model is still multilingual, supporting a large number of languages including English, Chinese, French, Spanish, German, Russian, Korean, Japanese, Vietnamese, Arabic, etc.

We conduct a series of evaluations for the base language models, and we compare with Meta-Llama3-70B, the recent SOTA language model as well as Mixtral-8x22B.