LLaVA-Mini is a unified large multimodal model that can support the understanding of images, high-resolution images, and videos in an efficient manner

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2025-01-13 03:00:07

LLaVA-Mini is a unified large multimodal model that can support the understanding of images, high-resolution images, and videos in an efficient manner. Guided by the interpretability within LMM, LLaVA-Mini significantly improves efficiency while ensuring vision capabilities. Model and demo of LLaVA-Mini are available now!

LLaVA-Mini only requires 1 token to represent each image, which improves the efficiency of image and video understanding, including:

Leave a Comment