At the recently concluded Meta Developer Conference, Llama 3.2 made a dazzling debut. This time, not only does it boast multimodal capabilities, but it has also partnered with companies like Arm to launch a “mobile” version optimised specifically for Qualcomm and MediaTek hardware.
According to official data, Llama 3.2 11B and 90B have demonstrated performance surpassing that of closed-source models of similar size.
In particular, for image understanding tasks, Llama 3.2 11B outperformed Claude 3 Haiku, and the 90B version can even hold its own against GPT-4o-mini.
Currently, the two largest models of Llama 3.2, 11B and 90B, support image inference, including document-level chart understanding, image description, and visual localisation tasks, such as pinpointing objects in images based on natural language descriptions.
For example, users can ask, “Which month had the best sales last year?” and Llama 3.2 can quickly provide an answer by reasoning over available charts.