πŸ“₯ Model Download |   ⚑ Quick Start |   πŸ“œ License |   πŸ“– Citation 
   πŸ“„ Paper Link |   πŸ“„ Arxiv Paper Link |   πŸ‘οΈ Demo  Introdu

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2025-01-01 09:30:16

πŸ“₯ Model Download | ⚑ Quick Start | πŸ“œ License | πŸ“– Citation πŸ“„ Paper Link | πŸ“„ Arxiv Paper Link | πŸ‘οΈ Demo

Introducing DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models.

Zhiyu Wu*, Xiaokang Chen*, Zizheng Pan*, Xingchao Liu*, Wen Liu**, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan*** (* Equal Contribution, ** Project Lead, *** Corresponding author)

Leave a Comment