What is RAG?

Retrieval-augmented generation. The trick that turns an AI that knows the internet into an AI that knows your company. Every "chat with your docs" tool you have ever touched runs on it.

Ask a raw model about your Q3 pricing and it makes something up. It has never seen your pricing. It was trained on the public internet a year ago and it has no idea who you are. RAG is the layer that fixes that, and it is the single most common way real companies bolt AI onto their own knowledge.

What it is

RAG stands for retrieval-augmented generation. Strip the jargon and it is two steps in a trench coat.

Retrieval: before the model answers, a search step runs over your stuff first. Your docs, your meeting notes, your help center, your markdown files. It pulls the handful of passages most relevant to the question.

Generation: those passages get pasted into the model's context alongside your question, and then the model answers. So it is not answering from memory. It is answering from the pages you just handed it, the way an analyst answers after reading the file you dropped on their desk.

That is the whole idea. Find the relevant pages, feed them to the model, let it write the answer. The "augmented" part is the retrieval step augmenting the model with knowledge it was never trained on: yours.

One detail worth knowing, because it is why RAG feels like magic the first time. The search step does not match keywords. It matches meaning. Ask "how do we handle a refund past 30 days" and it finds the policy doc that says "returns outside the standard window," even though not one word overlaps. The system turns your question and your documents into a kind of meaning-fingerprint and compares those. So you can ask in plain English and it finds the right page even when you and the doc used different words.

Why CEOs care

Because RAG is the difference between an AI that is impressive at parties and an AI that is useful at your company.

A model on its own is frozen and generic. It does not know last week's board deck, your customer list, or the decision you made in February. RAG is how you give it that knowledge without the impossible task of retraining the model. You keep your documents where they live, point a retrieval step at them, and now the AI answers grounded in your reality.

Every tool you have seen that promises "chat with your documents" or "grounded in your data" or "no hallucinations" is doing RAG under the hood. The customer-support bot trained on your help center. The "ask your Notion" feature. The legal tool that answers from your contracts. Same pattern, different wrapper. Once you can see RAG, you can see the machinery behind half the AI products being sold to you, and you can ask the one question that matters: where does the retrieval pull from, and who can see it?

It is also the cheap path. Retraining a model on your data costs real money and goes stale the day you sign a new customer. RAG stays current automatically: add a doc to the folder, and the next question can find it. That is why it won. It is the pragmatic 80/20 of putting AI on top of your own knowledge.

Where you'll see it

In gbrain, Garry Tan's memory layer for AI agents, which is RAG plus a few more layers stacked on top.
In the persistent-memory workflow on this site, which is RAG running entirely on your own laptop with a tool called QMD.
In any vendor pitch that says "trained on your data" (usually it is not trained, it is retrieved · ask which).
In your support team's AI assistant, your sales team's "ask the deal history" tool, and the next five internal AI projects someone pitches you.
Anywhere an agent needs to know something specific about your business before it acts.

What to do next

The fastest way to feel RAG is to run it on your own notes. Read the persistent memory workflow: you point a search tool at a folder of your files and your coding agent starts answering from your own past thinking instead of from the generic internet. Set it up this weekend and watch the first answer come back citing a memo you forgot you wrote. Tell me what it dug up.

The architecture behind this workflow.

Two operator manuals for the same job, run two ways: OpenCLAW for the always-on harness, Claude Code for the focused-work CLI. Pick one, or get the bundle for $149.

Browse the books · $99 each

Want one workflow like this taken apart end-to-end every week? The Tuesday Pro Deep Dive · $39/mo.