Tsinghua KEG Lab and Zhipu AI jointly launched CogAgent, a large image understanding model

2023-12-28 08:27:29

Bit News Tsinghua KEG Lab recently cooperated with Zhipu AI to jointly launch a new generation of image understanding large model CogAgent. Based on the previously launched CogVLM, the model uses visual modalities instead of text to provide a more comprehensive and direct perception of the GUI interface through a visual GUI agent for planning and decision-making. It is reported that CogAgent can accept 1120×1120 high-resolution image input, with visual question answering, visual positioning (Grounding), GUI Agent and other capabilities, in 9 classic image understanding lists (including VQAv2, STVQA, DocVQA, TextVQA, MM-VET, POPE, etc.) has achieved the first result in general ability.

VET1.13%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

0/400

No comments

Topic
#July PPI Beats Expectations
41k Popularity
#ETH ETFs Top $30B
43k Popularity
#Gate Alpha Peak Trading Competition
148k Popularity
#Bessent on BTC Reserves
6k Popularity
#Gate Releases August Reserves Report
19k Popularity

sitemap