Best Boilerplates with RAG Built-In 2026
RAG Is the Architecture Behind Every Useful AI Product
Retrieval-Augmented Generation (RAG) is the technique that makes AI products actually useful: instead of relying on an LLM's training data, you retrieve relevant context from your own data sources and inject it into the prompt.
A document Q&A chatbot? RAG. A customer support bot trained on your docs? RAG. A knowledge base with semantic search? RAG.
In 2026, RAG has moved from research technique to production standard. The boilerplates that include it out of the box — or make it easy to add — give you a meaningful head start.
TL;DR
Best boilerplates for RAG in 2026:
- Vercel AI SDK + pgvector — The most common stack. Supabase or Neon provides pgvector. Vercel AI SDK handles embedding and retrieval.
- OpenSaaS + RAG pattern — Add RAG to OpenSaaS's Wasp foundation. The most complete free base.
- Makerkit + AI Plugin — Enterprise-grade SaaS boilerplate with AI plugin including RAG patterns.
- LangChain.js starter templates — More complex orchestration for multi-step RAG pipelines.
- Custom: Next.js + Supabase + pgvector — Roll your own with well-documented patterns.
What RAG Requires
A production RAG system has four components:
| Component | Purpose | Common Tools |
|---|---|---|
| Embedding model | Convert text to vectors | OpenAI text-embedding-3, Anthropic, Cohere |
| Vector store | Store and search vectors | pgvector, Pinecone, Weaviate, Qdrant |
| Retrieval | Find relevant chunks | Cosine similarity, hybrid search |
| Generation | LLM uses retrieved context | OpenAI, Anthropic, Gemini |
The simplest stack: OpenAI embeddings + pgvector (in Supabase/Neon) + Vercel AI SDK for generation.
Stack Options
pgvector (PostgreSQL)
The simplest approach: add the pgvector extension to your existing PostgreSQL database. Available in Supabase and Neon with zero additional infrastructure.
-- Enable pgvector in Supabase/Neon:
CREATE EXTENSION IF NOT EXISTS vector;
-- Store document chunks with embeddings:
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB,
embedding VECTOR(1536) -- OpenAI text-embedding-3-small dimension
);
-- Semantic search function:
CREATE OR REPLACE FUNCTION match_documents(
query_embedding VECTOR(1536),
match_count INT DEFAULT 5
)
RETURNS TABLE(id BIGINT, content TEXT, metadata JSONB, similarity FLOAT)
LANGUAGE SQL STABLE AS $$
SELECT id, content, metadata,
1 - (embedding <=> query_embedding) AS similarity
FROM documents
WHERE 1 - (embedding <=> query_embedding) > 0.5
ORDER BY embedding <=> query_embedding
LIMIT match_count;
$$;
Dedicated Vector Databases
For large-scale RAG with millions of vectors:
| Database | Free tier | Best for |
|---|---|---|
| Pinecone | Yes (Starter) | Simplest API, managed |
| Weaviate | Yes (self-hosted) | Hybrid search, multi-modal |
| Qdrant | Yes (cloud) | Performance, self-hosted |
| pgvector | Yes (via Supabase/Neon) | Simplest infra (same DB) |
The RAG Implementation Pattern
Step 1: Ingest Documents
// lib/ingest.ts
import { openai } from '@ai-sdk/openai';
import { embed } from 'ai';
import { supabase } from '@/lib/supabase';
// Split document into chunks:
function chunkText(text: string, chunkSize = 500, overlap = 50): string[] {
const chunks: string[] = [];
for (let i = 0; i < text.length; i += chunkSize - overlap) {
chunks.push(text.slice(i, i + chunkSize));
}
return chunks;
}
export async function ingestDocument(text: string, metadata: object) {
const chunks = chunkText(text);
for (const chunk of chunks) {
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: chunk,
});
await supabase.from('documents').insert({
content: chunk,
metadata,
embedding,
});
}
}
Step 2: Retrieve Relevant Chunks
// lib/retrieve.ts
import { embed } from 'ai';
import { openai } from '@ai-sdk/openai';
import { supabase } from '@/lib/supabase';
export async function retrieveContext(query: string, topK = 5) {
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: query,
});
const { data: documents } = await supabase.rpc('match_documents', {
query_embedding: embedding,
match_count: topK,
});
return documents?.map(d => d.content).join('\n\n') ?? '';
}
Step 3: Generate with Context
// app/api/chat/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { retrieveContext } from '@/lib/retrieve';
export async function POST(req: Request) {
const { messages } = await req.json();
const userQuery = messages[messages.length - 1].content;
const context = await retrieveContext(userQuery);
const result = await streamText({
model: openai('gpt-4o'),
system: `You are a helpful assistant. Use the following context to answer the user's question:
${context}
If the context doesn't contain relevant information, say so.`,
messages,
});
return result.toDataStreamResponse();
}
Boilerplate Evaluations
Vercel AI SDK + pgvector (Recommended Free Stack)
The Vercel AI SDK's embed function handles embedding generation. Supabase provides pgvector. Together they form the simplest RAG stack for Next.js:
# Enable pgvector in Supabase:
# Dashboard → SQL Editor → Run: CREATE EXTENSION vector;
# Install deps:
npm install ai @ai-sdk/openai @supabase/supabase-js
No dedicated boilerplate exists for this — but the Supabase RAG quickstart and Vercel AI SDK docs together provide a complete guide.
OpenSaaS + RAG
OpenSaaS provides the SaaS foundation (auth, billing, admin). Add pgvector via Supabase (which OpenSaaS supports) for the RAG layer.
The combination gives you a complete AI SaaS with RAG capabilities without paying for a commercial boilerplate.
Makerkit AI Plugin
Makerkit's paid plugin marketplace includes an AI template with document Q&A patterns. If you are already using Makerkit ($299), the AI plugin extends it with:
- Document upload and processing
- Embedding generation
- Semantic search over uploaded documents
- Chat interface with document context
LangChain.js Starters
For complex RAG pipelines — multiple sources, re-ranking, query transformation — LangChain.js provides orchestration:
import { ChatOpenAI } from '@langchain/openai';
import { OpenAIEmbeddings } from '@langchain/openai';
import { SupabaseVectorStore } from '@langchain/community/vectorstores/supabase';
const embeddings = new OpenAIEmbeddings();
const vectorStore = await SupabaseVectorStore.fromExistingIndex(
embeddings,
{ client: supabaseClient, tableName: 'documents' }
);
const retriever = vectorStore.asRetriever({ k: 5 });
LangChain adds complexity but enables advanced RAG patterns like:
- Query transformation (HyDE, multi-query)
- Re-ranking (Cohere rerank)
- Multi-document summarization
- Hybrid search (dense + sparse)
Recommended Implementation by Use Case
| Use Case | Stack |
|---|---|
| Document Q&A | Next.js + Supabase pgvector + Vercel AI SDK |
| Knowledge base | Next.js + Supabase pgvector + pgfts (hybrid) |
| Multi-source RAG | LangChain.js + Pinecone |
| Product search | pgvector with hybrid (vector + full-text) |
| Customer support bot | OpenSaaS + pgvector |
Performance Considerations
- Chunk size matters. 500-1000 tokens per chunk is typical. Smaller chunks improve precision; larger chunks improve recall.
- Overlap prevents gaps. 50-100 token overlap between chunks ensures sentences at boundaries are captured.
- Hybrid search beats pure vector search. Combining pgvector similarity with PostgreSQL full-text search improves results significantly.
- Reranking improves quality. After retrieval, using a reranker (Cohere, or Colbert) reorders results for better LLM context.
Methodology
Based on publicly available information from Vercel AI SDK documentation, Supabase RAG guides, LangChain.js documentation, and community resources as of March 2026.
Hybrid Search: Why Pure Vector Search Often Disappoints
The most common misconception when building RAG systems is that vector similarity search is always better than keyword search. In practice, hybrid search — combining vector search with traditional full-text search — consistently outperforms either approach alone.
Vector search excels at semantic relevance: it finds conceptually related content even when the exact words don't match. A query for "how do I cancel my account" will match a document about "account deletion and offboarding" even without lexical overlap. This is the core RAG value proposition. But vector search can fail on exact matches. If someone asks about a specific function name like createCheckoutSession, vector search may return documents about payment flows in general rather than the specific function — because the embedding model doesn't understand that the exact token matters more than the semantic concept.
Full-text search (PostgreSQL's tsvector and tsquery) excels at exact and near-exact matches, proper nouns, product names, error messages, and code identifiers. It's also faster and doesn't require an embedding lookup. Its weakness is semantic blindness — it can't handle synonyms, paraphrasing, or conceptual similarity.
PostgreSQL supports both in the same query. The reciprocal rank fusion (RRF) algorithm combines results from both approaches without requiring calibrated scores:
-- Hybrid search: combine vector and full-text results via RRF
WITH semantic AS (
SELECT id, content,
ROW_NUMBER() OVER (ORDER BY embedding <=> query_embedding) AS rank
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 20
),
keyword AS (
SELECT id, content,
ROW_NUMBER() OVER (ORDER BY ts_rank(search_vector, query_tsquery) DESC) AS rank
FROM documents
WHERE search_vector @@ query_tsquery
LIMIT 20
)
SELECT COALESCE(s.id, k.id) AS id,
COALESCE(s.content, k.content) AS content,
(COALESCE(1.0 / (60 + s.rank), 0) + COALESCE(1.0 / (60 + k.rank), 0)) AS rrf_score
FROM semantic s
FULL OUTER JOIN keyword k ON s.id = k.id
ORDER BY rrf_score DESC
LIMIT 5;
Supabase supports this query natively. Neon supports it too — both run standard PostgreSQL. This single query pattern improves retrieval quality for most production RAG workloads without adding external infrastructure. Add it before reaching for a dedicated vector database.
Document Processing and Chunking Strategy
The quality of your RAG system depends less on the LLM and more on how you process documents before indexing them. Most developers underinvest in document processing and overinvest in LLM selection.
Chunk size directly controls the precision-recall tradeoff. Small chunks (200–300 tokens) give precise retrieval — when a chunk matches, it's highly relevant. But small chunks lose context: a sentence that references "the previous section's configuration" has no meaning without the surrounding content. Large chunks (800–1000 tokens) retain context but reduce retrieval precision — the entire chunk gets included even when only one sentence is relevant.
The practical solution is a hierarchical chunking approach: embed small chunks for retrieval but return the parent chunk for context. When you find a matching 200-token chunk, return the 600-token parent window surrounding it. This gives retrieval precision with context completeness. Implement it by storing both child and parent chunk IDs in your documents table and returning the parent content in your match_documents function.
Document type matters for chunking strategy. PDF documents from financial reports or legal contracts chunk well by page or section (headers and footers are noise, strip them). Markdown documentation chunks well by heading hierarchy (split on H2, then subdivide long H2 sections). Code documentation works best chunked by function or class (semantic boundaries rather than fixed token counts). Conversation logs (customer support, Slack exports) chunk by message, not by token window.
The metadata you store with each chunk is as important as the chunk text. At minimum, store the source document ID, the chunk's position within the document (for ordering in context), and a timestamp. Better: store the document title, section heading, and document type. This metadata enables filtered retrieval ("search only in docs updated after 2025-01-01") and citation generation ("this answer is based on Section 3 of the API Guide"). Supabase's JSONB column type handles arbitrary metadata without schema changes.
Evaluating RAG Quality in Production
Building a RAG system is not complete at launch — it degrades as your document corpus changes and as users ask questions the retrieval system handles poorly. Monitoring RAG quality requires specific techniques.
The simplest quality signal is thumbs up/down feedback on AI responses. Store every response with a reference to the retrieved chunks and the user's satisfaction signal. A response marked as unhelpful that retrieved chunks with high similarity scores indicates a retrieval problem (the chunks are similar but not relevant). A response marked as unhelpful with low similarity scores indicates a coverage gap (the answer isn't in your corpus). These two failure modes require different interventions.
Automated evaluation uses an LLM to grade its own responses against the retrieved context. The RAGAS framework provides metrics: context precision (are the retrieved chunks relevant?), context recall (did we retrieve all relevant information?), and answer faithfulness (does the answer reflect the retrieved context?). Running RAGAS evaluations on a test set of 50–100 representative queries before and after system changes gives objective quality measurements.
The coverage gap problem — users asking questions the corpus doesn't answer — is best addressed by logging low-confidence responses and routing them to a human reviewer who can add the missing content. Build a simple admin endpoint that shows the last 24 hours of queries where the LLM indicated it couldn't find relevant information. These become your content roadmap for expanding the corpus.
Building an RAG application? StarterPick helps you find the right SaaS foundation to build on top of.
See the best boilerplates for AI products for starters that integrate AI infrastructure cleanly.
Read the Supabase vs Neon vs PlanetScale guide for database choice context — pgvector support is a key differentiator.
Find the right SaaS foundation for your RAG product in the best SaaS boilerplates 2026 guide.