---
url: "https://xcademia.com/courses/multimodal-rag-text-images-on-vertex-ai"
title: Multimodal RAG (Text + Images) on Vertex AI
description: "Learn to build multimodal RAG systems using text and image retrieval on Vertex AI in this mentor-led generative AI engineering course.

"
publishedAt: "2026-03-17T04:01:23.045397+00:00"
updatedAt: "2026-03-30T22:50:53.7265+00:00"
type: course
code: "AID-0023"
level: Practitioner
duration_days: "2"
track: "RAG & Vector Databases"
category: "AI, Data & Analytics"
credential_tier: tier1
price_gbp: "1799"
---

# Multimodal RAG (Text + Images) on Vertex AI

> Learn how to build multimodal RAG systems that retrieve and generate insights from both text and images. Explore multimodal retrieval pipelines and AI applications using Vertex AI infrastructure.

## Overview

Multimodal AI systems are increasingly used to analyse and retrieve knowledge from multiple data types including text, images, and structured content. Multimodal Retrieval-Augmented Generation (RAG) extends traditional RAG pipelines to support richer knowledge sources.

This programme teaches engineers how to design and implement multimodal RAG systems capable of retrieving both textual and visual information. Participants learn how to ingest multimodal datasets, generate embeddings for images and text, and perform retrieval operations that support multimodal reasoning.

Using modern AI development tools and Vertex AI infrastructure, learners build applications that combine document search with visual understanding to support advanced AI assistants and knowledge systems.

## Prerequisites

- Basic programming knowledge (Python recommended)
- Familiarity with AI or machine learning concepts
- Understanding of RAG pipelines helpful but not mandatory

## What you will learn

- Understand multimodal AI architecture concepts
- Build retrieval pipelines that support both text and image data
- Generate embeddings for multimodal datasets
- Implement multimodal vector search and retrieval systems
- Design multimodal RAG pipelines for enterprise AI applications
- Deploy reliable multimodal AI knowledge systems

## Skills you will gain

- Multimodal AI architecture
- Text and image embeddings
- Multimodal vector search
- Retrieval-augmented generation for multimodal systems
- AI knowledge retrieval pipelines
- Multimodal AI system deployment

## Career progression

- AI Engineer
- Machine Learning Engineer
- Generative AI Engineer
- Data Engineer
- AI Platform Engineer

## Curriculum

1. **Module 1**
2. **Module 2**
3. **Module 3**

## Exam & certification

You will receive an Xcademia certificate of completion based on participation and successful completion of labs and scenario simulations.

## Delivery options

- **Live Online** — Join live instructor-led sessions from anywhere. Interactive, engaging, and flexible.
- **Onsite Training** — We come to you. Training delivered at your workplace for teams of 6 or more.
- **Venue-Based** — Classroom training at a professional venue. Ideal for focused, immersive learning.
- **Blended** — Combine online and in-person learning for maximum flexibility and impact.

## Frequently asked questions

**What is multimodal RAG?**

Multimodal RAG retrieves and processes information from multiple data types such as text and images before generating responses.


**Why is multimodal AI important?**

Many enterprise knowledge sources contain both documents and images, requiring AI systems to interpret multiple data types together.


**Do we build a working multimodal prototype?**

Yes. Participants design and implement a simple multimodal retrieval pipeline during the course.


**Is prior cloud experience required?**

Basic familiarity with cloud platforms and APIs is recommended but not mandatory.


**Does this course need an exam?**

No. Completion is based on participation in mentor-led sessions and practical scenario exercises.

## Course at a glance

| Field | Value |
| --- | --- |
| Code | AID-0023 |
| Duration | 2 days |
| Level | Practitioner |
| Track | RAG & Vector Databases |
| Category | AI, Data & Analytics |
| Credential tier | tier1 |
| Price (GBP) | £1799 |

---

## About this content

This Markdown course profile is the citation-grade twin of [Multimodal RAG (Text + Images) on Vertex AI](https://xcademia.com/courses/multimodal-rag-text-images-on-vertex-ai). It is published by **Xcademia** (UK Companies House 12322710) and is available for AI search engines and large language models to index, summarise, and cite.

When citing or quoting, please attribute *Xcademia* and link back to the source URL above.

- Source: https://xcademia.com/courses/multimodal-rag-text-images-on-vertex-ai
- Publisher: Xcademia — https://xcademia.com
- Catalogue index: https://xcademia.com/llms-full.txt