Best Boilerplates for AI Chatbot Products 2026
The AI chatbot SaaS pattern has standardized. In 2026, the technical stack is well-defined: Vercel AI SDK for streaming and multi-model support, useChat hook for the streaming UI, PostgreSQL for conversation history, and a credit or subscription billing model. The differentiation between products is the domain — a customer support bot for Shopify stores, a research assistant for law firms, a writing coach for content marketers.
No single boilerplate dominates the AI chatbot category. Most teams combine a general SaaS starter with AI SDK patterns. Here's how to do that correctly, plus the purpose-built options that exist.
TL;DR
Fastest path: Shipfast ($199) or T3 Stack (free) for SaaS infrastructure, plus Vercel AI SDK for chat. One day to add streaming, conversation history, and credit billing. Two days to production-ready chatbot SaaS.
Best Base Starters
Vercel AI Chatbot — Best Free Reference
Price: Free (MIT) | Creator: Vercel | GitHub Stars: 10K+
Vercel's official AI chatbot template is the most widely deployed reference implementation for chatbot products. It ships with streaming chat via Vercel AI SDK, conversation history in PostgreSQL via Drizzle, multi-model support (GPT-4o, Claude 3.5, Gemini), and a polished chat UI. Auth via Auth.js.
The code quality is production-grade. The streaming implementation is correct, conversation threading handles concurrent messages properly, and the Drizzle schema scales.
Get it: npx create-next-app --example https://github.com/vercel/ai-chatbot
Best for: Teams who want a free, correct chatbot foundation and will add SaaS billing themselves.
Shipfast + Vercel AI SDK — Best Paid Foundation
Price: $199 | Stack: Next.js, Supabase/MongoDB, Stripe
For AI chatbot SaaS that needs subscription billing, a landing page, and user management from day one, Shipfast provides the infrastructure. Add Vercel AI SDK in a few hours:
npm install ai @ai-sdk/openai @ai-sdk/anthropic @ai-sdk/google
Shipfast's Stripe billing handles subscriptions or one-time purchases. Its auth covers social logins and magic links. Add the streaming chat endpoint and useChat UI on top of the existing user/billing foundation.
Best for: Founders building AI chatbot SaaS who want billing and auth pre-configured.
T3 Stack + AI SDK — Best Type-Safe Foundation
Price: Free | Stack: Next.js, tRPC, Prisma, NextAuth
T3's tRPC procedures work well for AI chatbot operations: conversation creation, message history loading, credit management — all typed end-to-end. Add Vercel AI SDK for the streaming endpoint and useChat hook.
Best for: Developers who want maximum type safety and control over the data architecture.
Core Implementation
Multi-Model AI Setup
// lib/ai.ts — configure multiple model providers:
import { createOpenAI } from '@ai-sdk/openai';
import { createAnthropic } from '@ai-sdk/anthropic';
const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY! });
const anthropic = createAnthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
export function getModel(modelId: string) {
switch (modelId) {
case 'gpt-4o': return openai('gpt-4o');
case 'gpt-4o-mini': return openai('gpt-4o-mini');
case 'claude-3-5-sonnet': return anthropic('claude-3-5-sonnet-20241022');
case 'claude-3-5-haiku': return anthropic('claude-3-5-haiku-20241022');
default: return openai('gpt-4o-mini');
}
}
Streaming API Route
// app/api/chat/route.ts:
import { streamText, convertToCoreMessages } from 'ai';
import { auth } from '@/lib/auth';
import { getModel } from '@/lib/ai';
import { loadHistory, saveMessage } from '@/lib/conversations';
import { checkAndDeductCredits } from '@/lib/credits';
export async function POST(req: Request) {
const session = await auth();
if (!session?.user?.id) return new Response('Unauthorized', { status: 401 });
const { messages, conversationId, modelId = 'gpt-4o-mini', systemPrompt } = await req.json();
// Credit check:
const creditCost = modelId.includes('gpt-4o') || modelId.includes('sonnet') ? 5 : 1;
const ok = await checkAndDeductCredits(session.user.id, creditCost);
if (!ok) return new Response('Insufficient credits', { status: 402 });
// Load conversation history from database:
const history = conversationId ? await loadHistory(conversationId, 20) : [];
const allMessages = [...history, ...convertToCoreMessages(messages)];
const result = streamText({
model: getModel(modelId),
system: systemPrompt ?? 'You are a helpful assistant.',
messages: allMessages,
maxTokens: 2048,
onFinish: async ({ text }) => {
if (conversationId) {
await saveMessage({ conversationId, role: 'user', content: messages.at(-1)?.content });
await saveMessage({ conversationId, role: 'assistant', content: text });
}
},
});
return result.toDataStreamResponse();
}
Streaming Chat UI Component
// components/ChatInterface.tsx:
'use client';
import { useChat } from 'ai/react';
import { useState, useRef, useEffect } from 'react';
import { Send, Bot, User } from 'lucide-react';
export function ChatInterface({
conversationId,
systemPrompt,
}: {
conversationId?: string;
systemPrompt?: string;
}) {
const [model, setModel] = useState('gpt-4o-mini');
const bottomRef = useRef<HTMLDivElement>(null);
const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
api: '/api/chat',
body: { conversationId, systemPrompt, modelId: model },
onError: (err) => {
if (err.message.includes('402')) {
// Trigger upgrade prompt
}
},
});
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
}, [messages]);
return (
<div className="flex flex-col h-full">
<div className="p-3 border-b">
<select value={model} onChange={e => setModel(e.target.value)}
className="text-sm border rounded px-2 py-1">
<option value="gpt-4o-mini">GPT-4o Mini (1 cr)</option>
<option value="gpt-4o">GPT-4o (5 cr)</option>
<option value="claude-3-5-haiku">Claude Haiku (1 cr)</option>
<option value="claude-3-5-sonnet">Claude Sonnet (5 cr)</option>
</select>
</div>
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.map(msg => (
<div key={msg.id} className={`flex gap-3 ${msg.role === 'user' ? 'justify-end' : ''}`}>
{msg.role === 'assistant' && <Bot className="h-5 w-5 mt-1 shrink-0 text-blue-500" />}
<div className={`rounded-2xl px-4 py-2 max-w-[80%] text-sm whitespace-pre-wrap ${
msg.role === 'user' ? 'bg-blue-600 text-white' : 'bg-gray-100 dark:bg-gray-800'
}`}>
{msg.content}
</div>
{msg.role === 'user' && <User className="h-5 w-5 mt-1 shrink-0" />}
</div>
))}
{isLoading && (
<div className="flex gap-3">
<Bot className="h-5 w-5 mt-1 text-blue-500" />
<div className="bg-gray-100 rounded-2xl px-4 py-2 text-sm text-gray-400 animate-pulse">Thinking…</div>
</div>
)}
<div ref={bottomRef} />
</div>
<form onSubmit={handleSubmit} className="p-4 border-t flex gap-2">
<input value={input} onChange={handleInputChange}
placeholder="Message…"
className="flex-1 border rounded-lg px-4 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
disabled={isLoading} />
<button type="submit" disabled={isLoading || !input.trim()}
className="bg-blue-600 text-white rounded-lg px-4 py-2 hover:bg-blue-700 disabled:opacity-50">
<Send className="h-4 w-4" />
</button>
</form>
</div>
);
}
Conversation History Schema
// Drizzle schema for multi-conversation chatbot:
export const conversations = pgTable('conversations', {
id: text('id').primaryKey().$defaultFn(() => crypto.randomUUID()),
userId: text('user_id').notNull(),
title: text('title'), // Auto-generated from first message
model: text('model').notNull().default('gpt-4o-mini'),
createdAt: timestamp('created_at').defaultNow().notNull(),
updatedAt: timestamp('updated_at').defaultNow().notNull(),
});
export const chatMessages = pgTable('chat_messages', {
id: text('id').primaryKey().$defaultFn(() => crypto.randomUUID()),
conversationId: text('conversation_id').notNull(),
role: text('role', { enum: ['user', 'assistant', 'system'] }).notNull(),
content: text('content').notNull(),
tokens: integer('tokens'),
createdAt: timestamp('created_at').defaultNow().notNull(),
});
Credit Billing System
Credit-based billing works better than pure subscriptions for AI chatbots because model costs vary significantly:
// Credit system implementation:
// GPT-4o-mini: ~$0.0003/1K tokens → charge 1 credit
// GPT-4o: ~$0.0025/1K tokens → charge 5 credits
// Credit packs: 100 cr ($5), 500 cr ($20), 2000 cr ($60)
export async function checkAndDeductCredits(
userId: string,
amount: number
): Promise<boolean> {
return db.$transaction(async (tx) => {
const user = await tx.user.findUnique({
where: { id: userId },
select: { credits: true },
});
if (!user || user.credits < amount) return false;
await tx.user.update({
where: { id: userId },
data: { credits: { decrement: amount } },
});
return true;
});
}
// Stripe checkout for credit packs:
const session = await stripe.checkout.sessions.create({
payment_method_types: ['card'],
line_items: [{ price: CREDIT_PACK_PRICE_ID, quantity: 1 }],
mode: 'payment',
metadata: { userId, credits: '500' },
success_url: `${BASE_URL}/credits/success?session_id={CHECKOUT_SESSION_ID}`,
cancel_url: `${BASE_URL}/credits`,
});
RAG: Document Grounding
For AI chatbots that answer questions about specific documents or knowledge bases:
// pgvector RAG pattern (requires PostgreSQL with pgvector extension):
import { embed } from 'ai';
import { openai } from '@ai-sdk/openai';
// Index document chunk with embedding:
async function indexDocumentChunk(content: string, documentId: string) {
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: content,
});
await db.documentChunk.create({
data: { content, embedding, documentId }
});
}
// Semantic search for relevant context:
async function getRelevantContext(query: string, limit = 5): Promise<string> {
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: query,
});
const chunks: { content: string }[] = await db.$queryRaw`
SELECT content FROM document_chunks
ORDER BY embedding <=> ${embedding}::vector
LIMIT ${limit}
`;
return chunks.map(c => c.content).join('\n\n');
}
// Inject context into system prompt:
const context = await getRelevantContext(lastUserMessage);
const systemPromptWithContext = `
You are a helpful assistant. Answer questions based on the following context:
${context}
If the answer isn't in the context, say you don't know.
`;
Billing Models Compared
| Model | Setup | Best For |
|---|---|---|
| Credit packs | Stripe one-time + credits table | Variable usage, fair to users |
| Monthly subscription | Stripe Subscriptions + monthly reset | Predictable revenue |
| Freemium | Free tier (50 cr/day) + paid | User acquisition |
| Per-seat B2B | Subscription × team members | Enterprise buyers |
Domain Differentiation: Where AI Chatbots Win
The generic AI chat assistant market is commoditized. ChatGPT, Claude, and Gemini are free or close to free for most users. Building another general-purpose chatbot assistant in 2026 is difficult to monetize.
The AI chatbots that successfully charge money are domain-specific:
- Legal research assistants — trained on case law, statutes, and regulations. Lawyers pay $200+/month because the alternative is paralegal time at $80+/hour.
- Customer support bots for specific platforms — a chatbot trained on Shopify documentation and your client's store data saves 3-5 support hours per week per client.
- Code review assistants — specialized for specific languages or frameworks (React, Terraform, Kubernetes) with knowledge of best practices and your organization's standards.
- Medical information assistants — symptom checkers and drug interaction lookups with proper medical disclaimers, sold to healthcare providers or patients.
- Financial analysis chatbots — earnings analysis, SEC filing interpretation, portfolio Q&A for investors.
The common thread: the chatbot knows something that GPT-4o doesn't know by default, either through fine-tuning or RAG (retrieval-augmented generation). The premium you can charge is proportional to the domain knowledge and the cost of the alternative.
Rate Limiting and Abuse Prevention
AI APIs are expensive. Without rate limiting, a determined user (or a simple script) can exhaust your API budget in minutes. Three layers of protection:
Per-user message limits — hard cap on messages per day or per hour:
// Redis rate limiter — 50 messages per user per day:
async function checkRateLimit(userId: string): Promise<boolean> {
const key = `rl:chat:${userId}:${new Date().toDateString()}`;
const count = await redis.incr(key);
if (count === 1) await redis.expire(key, 86400); // Expire at midnight
return count <= 50;
}
Token limits per message — prevent prompt injection attacks that try to extract expensive responses by setting maxTokens in your streaming endpoint.
Input validation — reject suspiciously long system prompts or user messages that could be prompt injection attempts trying to override your chatbot's behavior.
Testing AI Chatbot Integrations
Testing streaming AI responses requires some care — you can't mock the OpenAI API like a standard REST endpoint:
// Use ai/test for unit tests with mock responses:
import { generateText, streamText } from 'ai';
import { MockLanguageModelV1 } from 'ai/test';
const mockModel = new MockLanguageModelV1({
doGenerate: async () => ({
rawCall: { rawPrompt: null, rawSettings: {} },
finishReason: 'stop',
usage: { promptTokens: 10, completionTokens: 20 },
text: 'Hello! How can I help you today?',
}),
});
// Test your chat logic without hitting OpenAI:
const result = await generateText({
model: mockModel,
messages: [{ role: 'user', content: 'Hello' }],
});
expect(result.text).toBe('Hello! How can I help you today?');
The Vercel AI SDK ships ai/test specifically for this purpose. Use it to test your credit deduction logic, conversation history loading, and rate limiting without making real API calls.
Model Selection Strategy
Not all queries need GPT-4o. A smart routing strategy can cut API costs by 60-80%:
- Simple Q&A, FAQs, short responses → GPT-4o-mini or Claude Haiku (1 credit, fast)
- Complex analysis, code generation, long form → GPT-4o or Claude Sonnet (5 credits, better)
- Image analysis → GPT-4o Vision (5+ credits)
- Structured data extraction → GPT-4o with JSON mode (fast, reliable)
Let users choose from a model selector but default to the cheaper model. Power users who need better quality will upgrade. This keeps your API costs low for the majority of queries while offering quality tiers that match your credit pricing.
For the SaaS billing infrastructure to support these models, see the best boilerplates with Stripe integration. For the broader AI SaaS boilerplate landscape, the best AI SaaS boilerplates guide covers starters specifically designed for LLM-powered products.
How to Evaluate AI Chatbot Starters
When comparing specific boilerplates for an AI chatbot product, evaluate these dimensions in order:
Streaming implementation quality. Clone the repo and run it. Does the streaming work? Does it handle errors gracefully (model timeout, rate limit, network interruption)? Does the isLoading state clear correctly when an error occurs mid-stream? Many starters have streaming that looks correct in the happy path but has subtle bugs under failure conditions that only appear in production.
Conversation threading. Does the boilerplate support multiple independent conversations per user, or just a single ongoing chat? For most chatbot products, users need conversation history they can name, return to, and organize. A starter with only a single chat history requires significant schema and UI work to support threading.
Token accounting accuracy. The credit deduction model only works correctly if token counts are accurate. Vercel AI SDK's onFinish callback provides usage.promptTokens and usage.completionTokens after stream completion — this is the reliable place to deduct credits. Starters that deduct before the stream completes (based on input tokens only) systematically undercharge and lose money at scale.
Rate limiting placement. Rate limiting should happen before the LLM API call, not after. A starter that checks credits after generating a response has already incurred API cost when the check fails. Check where the rate limit and credit check middleware runs in the request lifecycle.
Model abstraction. Can you add a new model by changing one configuration entry, or does adding a model require changes across multiple files? The Vercel AI SDK makes this easy — one import change plus one entry in the model map. Starters that hard-code OpenAI throughout the codebase create expensive migration work when you want to offer alternative models.
What These Options Have in Common
Whether you start with Vercel AI Chatbot, Shipfast, or T3 Stack, the architectural pattern for AI chatbot SaaS has converged in 2026:
Vercel AI SDK as the LLM abstraction layer. PostgreSQL (or equivalent) for conversation and message persistence. Credit or subscription billing via Stripe. Server-side credit checks before API calls. Token deduction via onFinish callbacks after completion. Redis for rate limiting and session state. The streaming UI via the useChat hook.
The differentiation isn't the infrastructure — it's the domain expertise you add on top. A legal research chatbot knows case law and citation formats. A customer support bot knows your product documentation. A code review assistant knows your team's coding standards. The value proposition is always the domain knowledge embedded in the system prompt and the RAG pipeline, not the streaming implementation.
For the broader AI product infrastructure picture, see best AI SaaS boilerplates for shipping fast and top AI SaaS boilerplates with built-in AI. For open-source options that include AI features without a license fee, the free open-source SaaS boilerplates guide covers Open SaaS and other MIT-licensed starters with AI integration. For the LLM-focused boilerplate comparison including RAG and structured output patterns, see the best AI/LLM boilerplates guide.
Deployment Considerations for AI Chatbot Products
AI chatbot products have deployment constraints that standard SaaS applications don't face. Streaming responses require long-lived HTTP connections that some deployment platforms handle poorly.
Vercel function timeout. Vercel's default serverless function timeout is 10 seconds on the Hobby plan, 60 seconds on Pro. A long AI response can exceed the 10-second limit and terminate mid-stream. Solutions: upgrade to Pro, use Vercel's streaming response format (which keeps the connection alive beyond the timeout), or move the AI endpoint to Cloudflare Workers (no timeout limit for streaming).
Edge runtime compatibility. If you deploy AI endpoints to the Edge Runtime (for reduced latency), confirm that your chosen AI SDK version supports the Edge Runtime. The Vercel AI SDK fully supports edge deployment. However, database clients like Prisma use Node.js APIs that aren't available at the edge — use Drizzle with a PostgreSQL HTTP client (Neon's serverless driver or PlanetScale's HTTP driver) for edge-compatible database queries.
Cost monitoring. AI API costs can grow unexpectedly when users discover and heavily use your product. Before launch, implement: a daily spend alert via the OpenAI/Anthropic dashboard, a hard limit per user per day (beyond the credit system's soft limit), and a Slack or email notification when daily spend exceeds a threshold. The combination of per-user rate limiting and API spend monitoring catches both abusive usage and legitimate viral growth before the invoice becomes a problem.
Prompt caching. Anthropic and OpenAI both support prompt caching for long, repeated system prompts. If your chatbot has a static system prompt longer than 1,024 tokens, enabling prompt caching reduces costs by 80-90% on the prompt tokens for repeated conversations. This is particularly valuable for document-grounded chatbots where the context document is injected into the system prompt for every message.
Monitoring Production AI Chatbots
Production chatbot products need observability beyond standard API latency metrics:
Response quality varies by model and prompt version. Track user-level signals: conversation length (longer conversations indicate engagement), thumbs-up/thumbs-down feedback if you implement it, and conversation abandonment rate (user sends one message and doesn't respond to the AI's reply). These are leading indicators of response quality degradation before it shows up in revenue metrics.
Token usage per conversation affects unit economics. Track average tokens per conversation by model and by user segment. If a user segment consistently uses 3x the average tokens per conversation, your credit pricing for that segment may be underwater. Token analytics give you the data to adjust pricing or add per-conversation caps before the model shifts from profitable to loss-making.
Error rates by model are a signal of prompt compatibility. Different models respond differently to the same system prompt — a system prompt tuned for GPT-4o may produce poor results from Claude Haiku. Track error and refusal rates separately per model. When you add a new model, run evaluation prompts against it before exposing it to users.
Browse AI chatbot starters at StarterPick — filter by AI SDK support and conversation history features.
The boilerplate and tool choices covered here represent the most actively maintained options in their category as of 2026. Evaluate each against your specific requirements: team expertise, deployment infrastructure, budget, and the features your product requires on day one versus those you can add incrementally. The best starting point is the one that lets your team ship the first version of your product fastest, with the least architectural debt.