Tsinghua KEG Lab and Zhipu AI jointly launched CogAgent, a large image understanding model

Bit News Tsinghua KEG Lab recently cooperated with Zhipu AI to jointly launch a new generation of image understanding large model CogAgent. Based on the previously launched CogVLM, the model uses visual modalities instead of text to provide a more comprehensive and direct perception of the GUI interface through a visual GUI agent for planning and decision-making. It is reported that CogAgent can accept 1120×1120 high-resolution image input, with visual question answering, visual positioning (Grounding), GUI Agent and other capabilities, in 9 classic image understanding lists (including VQAv2, STVQA, DocVQA, TextVQA, MM-VET, POPE, etc.) has achieved the first result in general ability.

VET1.13%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)