How do I build my own RAG?

Author: neil.watkins@leadingai.co.uk

Published: 05/06/2026

How do I build my own RAG AI?
We were talking to a client this week who basically said “We want to build it ourselves, how do we do it?”.
Now, normally we’d politely tell them to jog on, but this time I thought I’d share the ‘recipe’ with them so they could fully understand the implications of trying to build your own private and secure RAG.
And the real ‘cost’ isn’t in the AI. That’s the cheap part.

Hosting

When we talk about ‘private and secure’, we mean creating your own isolated environment on Azure or AWS. We’ve never tried it in Google, but I’m sure it can be done. We’re then calling a hosted Large Language Model (LLM) model through a no-training, data-residency-guaranteed endpoint (Azure OpenAI, AWS Bedrock, the Anthropic API on enterprise terms). Azure OpenAI is used by most of our clients, but we’ve just become a Claude partner so will be testing the API as soon as it’s available through Azure UK South. A point to note is that doing it properly in AWS is currently 10x the price of doing it in Azure. And that’s after their engineers have ratified our solution and told us we’re doing it correctly, so factor that into your calculations if yours is an AWS shop.

Building

The good news is, the build is basically the same regardless of hosting choice:
  • Ingestion and parsing. Connect to the source systems (SharePoint, file shares, intranets, case management) and get clean text out of messy formats. Scanned PDFs need OCR, tables need structure-aware extraction, and a badly parsed document can mess up every answer that touches it. You need to budget real time here. It’s never the bit people expect, but it makes or breaks the quality of the outputs and therefore user confidence.
  • Chunking and embedding. Split documents into retrievable pieces with sensible boundaries and overlap, attach metadata (source, date, access level), then convert each chunk to a vector via an embedding model. Note the embedding model choice now, because changing it later means re-embedding the entire index. For large indexes, that can take some time and effort.
  • Vector store. Somewhere to hold and search those vectors. Our tool of choice is Azure AI Search.
  • Retrieval and generation. Query comes in, gets embedded, you search (ideally hybrid, keyword plus semantic, with a reranking pass), then feed the top chunks plus the question to the model with a prompt that forces it to ground answers in the retrieved context and cite sources. Grounding and citation are super important for obvious reasons and guard against hallucinations and the associated nonsense.
  • The security layer. This is probably the most important section for obvious reasons. It ensures people don’t see things they are not supposed to, and stops poison code or document injections. Your security people will understand all of this, but the headlines are:
    • Authentication and Role-Based Access Control (RBAC) via Single Sign-On (SSO), ideally tied to existing identity.
    • Encryption in transit and at rest, network isolation (private endpoints, no public ingress), and proper management.
    • A Data Protection Impact Assessment (DPIA) and data-flow map done up front, not retrofitted.

Ongoing maintenance

The piece most people don’t think about during setup.

In summary

So, there you go. You now know more than 90% of people how to build your own private and secure RAG AI solution.
In short, building it is a few weeks of engineering, but running it well is a big commitment. Or you could get the experts to do it for you. Just saying…