Dive into Vision Language Models

less than 1 minute read

Published: January 21, 2024

Since 2021, we’ve seen an increased interest in models that combine vision and language modalities (Joint Vision-Language models). VLM have shown particulary impressice capabilities in very challenging tasks such as image captioning, text-guided image generation and manipulation, and visual question-answering.

In this blog post, I will brifely give a high-level description of everything you need to know about Joint Vision-Language models.

Introduction

What does it mean to call a model a “Vision-Language” Model?

Share on

Twitter Facebook LinkedIn

Tuong Tran Ngoc

Dive into Vision Language Models

Introduction

Share on

You May Also Enjoy

Claude Code In Action

Hello, Kalapa

Core Probability

Code faster with VSCode