Why is multimodal AI important?

Many enterprise knowledge sources contain both documents and images, requiring AI systems to interpret multiple data types together.

Do we build a working multimodal prototype?

Yes. Participants design and implement a simple multimodal retrieval pipeline during the course.

Is prior cloud experience required?

Basic familiarity with cloud platforms and APIs is recommended but not mandatory.

Does this course need an exam?

No. Completion is based on participation in mentor-led sessions and practical scenario exercises.

Multimodal RAG (Text + Images) on Vertex AI

Name: Multimodal RAG (Text + Images) on Vertex AI
Price: 1799 GBP
Availability: InStock

Multimodal AI systems are increasingly used to analyse and retrieve knowledge from multiple data types including text, images, and structured content. Multimodal Retrieval-Augmented Generation (RAG) extends traditional RAG pipelines to support richer knowledge sources.

This programme teaches engineers how to design and implement multimodal RAG systems capable of retrieving both textual and visual information. Participants learn how to ingest multimodal datasets, generate embeddings for images and text, and perform retrieval operations that support multimodal reasoning.

Using modern AI development tools and Vertex AI infrastructure, learners build applications that combine document search with visual understanding to support advanced AI assistants and knowledge systems.

Hands-On Learning

Participants design a multimodal retrieval pipeline capable of processing both image and text knowledge sources.

Mentor-Led Sessions

Industry mentors demonstrate real-world multimodal AI architectures used in enterprise AI platforms and advanced generative AI systems.

Flexible Delivery Options

Ways to Learn

Choose the learning format that works best for you and your team

2 Days

Small cohorts

Exam & Certification Information

Everything you need to know about the certification exams

Awarding Organisation

Multimodal RAG retrieves and processes information from multiple data types such as text and images before generating responses.

Multimodal RAG (Text + Images) on Vertex AI

Course Overview

Hands-On Learning

Mentor-Led Sessions

Career-Ready Skills

Learning Outcomes

Prerequisites

Detailed Syllabus

Domain 1: Foundations of Multimodal AI

Module 1: Introduction to Multimodal AI Systems

Topics Covered:

Module 2: Multimodal Embeddings

Domain 2: Multimodal Retrieval Architecture

Domain 3: Building Multimodal RAG Applications

Skills You'll Gain

Career Progression

Ways to Learn

Live Online

Prefer a Faster, Personalised Route into IT?

Exam & Certification Information

Important Information

Certificate of Completion

Frequently Asked Questions

Ready to Start Your Learning Journey?