What is RAG?
RAG (Retrieval Augmented Generation) is a powerful technique that allows AI to answer questions based on your specific knowledge base. Think of it as giving the AI access to your own textbooks and teaching materials!
Image source: Gradient Flow
How Does RAG Work?
Let's understand RAG through a practical example: creating a Chinese Language Teaching Assistant.
Step 1: Data Preparation (Left side of diagram)
A. Raw Data Sources
- Your Chinese teaching materials
- Lesson plans
- Exercise worksheets
- Teaching guides
- Past exam papers
B. Information Extraction
- These materials are converted into plain text
- Images are processed with OCR (Optical Character Recognition)
- PDFs and web pages are converted to text
C. Chunking
- Long texts are broken into smaller, manageable pieces called chunks
- For example, a lesson plan might be split into individual topics or activities
- Each chunk is typically 500-1000 characters long
D. Embedding
- Each chunk is converted into a special format called an embedding
- Think of embeddings as the AI's way of understanding text
- Similar concepts will have similar embeddings
- For example, "寫作" (writing) and "作文" (composition) would have similar embeddings
Step 2: Question Answering Process (Right side of diagram)
-
Query Processing
- When a student asks: "Who is Fan Zhongyan (范仲淹)?"
- The question is converted into an embedding
-
Retrieval
- The system finds chunks with similar embeddings
- It might retrieve your lesson plans about measure words
- Also finds related examples and exercises
-
Relevant Data
- The most relevant chunks are collected
- For example:
- How you teach Fan Zhongyan (范仲淹)
- Common student mistakes
- Practice exercises
-
LLM Processing
- The LLM (Large Language Model) combines the retrieved information with its knowledge
- Creates a coherent, helpful response
-
Response
- Provides a detailed answer using your specific teaching materials
- Might suggest activities and examples you've used before
Why Use RAG?
-
Customized Knowledge
- The AI answers based on YOUR teaching materials
- Maintains consistency with your teaching style
- Uses examples familiar to your students
-
Up-to-date Information
- You can update the knowledge base anytime
- Add new teaching materials as needed
- Include current examples and exercises
-
Reliable Answers
- The AI only uses information you've provided
- Reduces the chance of incorrect or irrelevant responses
- Stays within your approved teaching content
What's Next?
In the following tutorials, you'll learn how to:
- Prepare your teaching materials
- Create your own Chinese Teaching Assistant (Virtual Fan Zhongyan)
- Test and improve its responses
- Deploy it for your students
Let's start building your own RAG Chatbot that can help with your teaching!