⚡ AI Engineering
Retrieval Augmented Generation (RAG)
Give LLMs access to external knowledge at inference time instead of baking it in through fine-tuning. The idea is simple: before the model generates an answer, retrieve relevant information from your own data and inject it into the prompt. The model generates a grounded answer using that context — not just its training data.
2
Minutes
6
Concepts
+45
XP
1
Why RAG Over Fine-Tuning
| Factor | RAG | Fine-Tuning | |
|---|---|---|---|
| -------- | ----- | ------------- | |
| Cost | Low — just embedding + storage | High — GPU hours for training | |
| Knowledge freshness | Update docs anytime, instant | Retrain on new data | |
| Source attribution | Cite exact chunks retrieved | Model "just knows" — no sources | |
| Model flexibility | Works with any model | Locked to one fine-tuned model | |
| Setup time | Hours | Days to weeks | |
| Hallucination control | Grounded in retrieved context | Still hallucinates, just differently |
Fine-tuning changes the model's behavior (tone, format, reasoning style). RAG changes what it knows. They solve different problems — and you can combine them.