Hands-On
Retrieval-Augmented Generation

Jeroen Herczeg

This book has interactive examples that teach software developers how to implement Retrieval-Augmented Generation applications that are production-ready, effective, and safe.

What is Retrieval-Augmented Generation (RAG)?

A Retrieval-Augmented Generation (RAG) system combines information retrieval with Large Language Models (LLMs) to improve the quality and relevance of generated text. This allows LLMs to access up-to-date or private information and provide factual answers with verifiable sources.

Designed to teach you practical, hands-on methods for implementing Retrieval-Augmented Generation (RAG).

When you first learn about RAG, it might come across as a simple system meant to improve the accuracy of a Large Language Model. But once you start implementing it, you realize that it's quite complicated and requires a good grasp of retrieval and generation techniques.

In this book, we will start by examining how large language models work, as well as their limitations and challenges. Next, we'll take a close look at the RAG architecture and how it can improve the performance of a language model. Finally, we'll discuss the most frequent difficulties encountered when developing a RAG application.

Learn how to ingest PDFs and other documents that include multimedia content.
Discover the essential elements of RAG and how to choose the right components for your system.
Understand how to integrate external memory to enhance conversation continuity and context in RAG applications.
Master advanced optimization techniques, including re-ranking and hybrid search, to improve the efficiency and effectiveness of your RAG system.
Learn best practices for developing RAG systems that prioritize safety and security in production environments.
Gain insights into testing and evaluating the performance of your RAG system to ensure reliability and accuracy.
Discover strategies for forecasting and managing the operational costs associated with running a RAG system.

Throughout the book, you will learn through live interactive examples that will help solidify your understanding. By the end, you will have the confidence to deploy powerful RAG applications that solve real-world problems.

Get the first chapter for free straight to your inbox →

01 Table of contents

Get a look at all of the content covered in the book. Everything you need to know is inside.

“Hands-On Retrieval-Augmented Generation” is comprised of 240 tightly edited pages designed to teach you everything you need to know about Retrieval-Augmented Generation with no unnecessary filler.

Retrieval-Augmented Generation

1 Introduction
What is Retrieval-Augmented Generation?
2 Understanding the Challenges of Large Language Models
Hallucinations and Knowledge Gaps
Knowledge Cutoff
Limited Contextual Understanding
Observability
4 Use cases
Question-Answering System
Conversational Agent
Real-time Event Commentary
Content Generation
6 Building a Naive RAG System
Create an OpenAI Account
Create an Assistant
Adding the Data
9 Limitations of RAG
Potential Drawbacks

Large Language Models

coming soon Foundation
Model Architecture
Weights and Biases
Tokenization
Training
Inference
Settings
coming soon Context Window
Sliding Window
Attention Mechanism
Memory
coming soon Prompt Engineering
Anatomy of a Prompt
Zero-shot Prompting
Few-shot Prompting
Chain-of-Thought Prompting
coming soon Fine-Tuning
Transfer Learning
Domain-Specific Fine-Tuning

Data Ingestion

coming soon Data Sources
Document Formats
SERP and REST APIs
Web Scraping
Databases
PDFs, Images, and Multimedia
coming soon Data Preprocessing
Text Splitting
Converting Unstructured to Structured Data
Dealing with Noisy Data

Vector Search

coming soon Introduction
Keyword Search vs Semantic Search
coming soon Understanding Vectors
Definition of a Vector
Norms, Distances, and Similarities
Vector Operations
coming soon Creating Embeddings
What are Embeddings?
Types of Embeddings
Training Embeddings
Pre-trained Embeddings
coming soon Measuring Similarity
Cosine Similarity
Euclidean Distance
Dot product
coming soon Approximate Nearest Neighbor (ANN) Search
Introduction to ANN Search
ANN Algorithms
Trade-offs in ANN Search
coming soon Vector Databases
Introduction to Vector Databases
Vector Search Engines
Vector Indexing and Querying

Building a RAG Pipeline

coming soon LlamaIndex
Introduction
Implementation
coming soon LangChain
Introduction
Implementation
coming soon Haystack
Introduction
Implementation

Advanced Techniques

coming soon Re-ranking
Introduction
Re-ranking Strategies
coming soon Hybrid Search
Introduction
Hybrid Search Strategies
coming soon Multi-Modal Search
Introduction
Multi-Modal Search Strategies
coming soon Advanced Embedding Techniques
Contextual Embeddings
Dynamic Embeddings

Deployment

coming soon Model Serving
Hosted Services
On-premises Deployment
Model Versioning
coming soon Continuous Integration and Deployment
CI/CD Pipelines
Evaluation
Monitoring
coming soon Scalability and Performance
Horizontal and Vertical Scaling
Performance Tuning

Production-Ready

coming soon Operational
Cost Analysis
Monitoring
coming soon Security
Data Privacy
Model Security
Compliance

02 Interactive examples

Accelerating your understanding.

Explore interactive examples deployed on HuggingFace Spaces to accelerate your understanding as you progress through the book.

Running

Question-Answering System

Running

Conversational Agent

03 Pre-order

Become an early reader.

Enter your email address and I’ll send you the first chapter from the book for free.

“We are currently living in a remarkable time. Artificial intelligence is advancing at an unprecedented rate. With access to the most advanced AI models, we are now able to develop software features that were previously difficult or even impossible to create. The future of AI is not just about algorithms and data. It's about the people who harness these models to solve real-world problems.”

— Jeroen Herczeg, Author

04 Author

Jeroen Herczeg

Hey there, I’m the author.

I have worked in software engineering for over two decades, specializing in building and maintaining efficient, reliable, and scalable systems. In 2015, I discovered my passion for artificial intelligence and have been learning more about this field and how to apply it in a practical way. As a speaker at various meetups, I have always been passionate about learning and sharing my knowledge with others.

Follow on Hugging Face Follow on GitHub Follow on X

Hands-OnRetrieval-Augmented Generation

This book has interactive examples that teach software developers how to implement Retrieval-Augmented Generation applications that are production-ready, effective, and safe.

What is Retrieval-Augmented Generation (RAG)?

Designed to teach you practical, hands-on methods for implementing Retrieval-Augmented Generation (RAG).

Get a look at all of the content covered in the book. Everything you need to know is inside.

Retrieval-Augmented Generation

Large Language Models

Data Ingestion

Vector Search

Building a RAG Pipeline

Advanced Techniques

Deployment

Production-Ready

Accelerating your understanding.

Question-Answering System

Conversational Agent

Become an early reader.

Jeroen Herczeg

Hey there, I’m the author.

Hands-On
Retrieval-Augmented Generation