AI Engineering
Retrieval Augmented Generation (RAG)
Give LLMs access to external knowledge at inference time instead of baking it in through fine-tuning. The idea is simple: before the model generates an answer, retrieve relevant information from your own data and inject it into the prompt. The model generates a grounded answer using that context — not just its training data.
2
Minutes
6
Concepts
+45
XP
1
Why RAG Over Fine-Tuning
FactorRAGFine-Tuning
--------------------------
CostLow — just embedding + storageHigh — GPU hours for training
Knowledge freshnessUpdate docs anytime, instantRetrain on new data
Source attributionCite exact chunks retrievedModel "just knows" — no sources
Model flexibilityWorks with any modelLocked to one fine-tuned model
Setup timeHoursDays to weeks
Hallucination controlGrounded in retrieved contextStill hallucinates, just differently

Fine-tuning changes the model's behavior (tone, format, reasoning style). RAG changes what it knows. They solve different problems — and you can combine them.