Adding AI Features to Your SaaS Boilerplate 2026
TL;DR
No major SaaS boilerplate ships production-ready AI features in 2026. ShipFast includes a basic chat demo; most others include nothing. You'll add AI yourself — but the good news is it's only ~200 lines of code for a solid foundation: a streaming chat API route, a useChat hook on the frontend, per-user token tracking, and rate limiting. The hard parts aren't the AI call itself — they're billing (who pays for tokens?), abuse prevention (protecting expensive endpoints), and UX (streaming responses, error states, interruption handling).
Key Takeaways
- Vercel AI SDK is the standard for Next.js AI integration — streaming, tool use, provider switching in one package
- Token tracking belongs in
onFinishcallback, writing to your DB after every completion - Rate limiting: AI endpoints need tighter limits than regular API — use Upstash per-user sliding window
- Credit system is simpler than Stripe Meters for most SaaS — deduct credits on completion, check before request
- ShipFast includes a chat UI demo; T3/Supastarter/Makerkit require you to build from scratch
- Streaming interruption (user presses stop) needs explicit
AbortControllerhandling
What Boilerplates Currently Include
| Boilerplate | AI Chat | Token Tracking | Rate Limiting | Credit System |
|---|---|---|---|---|
| ShipFast | ✅ Demo | ❌ | ❌ | ❌ |
| T3 Stack | ❌ | ❌ | ❌ | ❌ |
| Supastarter | ❌ | ❌ | ❌ | ❌ |
| Makerkit | ❌ | ❌ | ❌ | ❌ |
| Open SaaS (Wasp) | ✅ Demo | ❌ | ❌ | ❌ |
You will build this yourself. Here's the complete stack.
Choosing Your LLM Provider
The Vercel AI SDK abstracts provider differences behind a common interface, making it practical to start with one provider and add others as fallbacks or for specific use cases. The practical choice in 2026:
GPT-4o-mini is the default workhorse for most AI SaaS features: $0.15/M input tokens, fast, reliable, excellent instruction following. Use it for features where cost matters and quality is "good enough" (form summarization, FAQ answering, tag generation, basic Q&A). GPT-4o at $2.50/M input tokens for tasks where quality noticeably matters to users (creative writing, complex reasoning, code generation in a premium plan).
Claude 3.5 Sonnet from Anthropic is competitive with GPT-4o on most benchmarks at similar pricing ($3/M input, $15/M output). Claude models are better at long-context tasks (200K token context window), code generation, and following complex instructions precisely. The behavioral difference that matters for consumer products: Claude is better calibrated on refusals — less likely to refuse legitimate requests while still refusing genuinely harmful ones. This reduces false-positive content filtering that frustrates users.
Gemini Flash from Google is the cheapest option for high-volume, low-latency tasks. Gemini 2.0 Flash at $0.10/M input tokens is 1.5x cheaper than GPT-4o-mini for applications that need to minimize cost per request. Gemini's multimodal capabilities (image, audio, video) are best-in-class if your product involves non-text input.
The Vercel AI SDK makes switching providers a two-line change. Start with OpenAI for ecosystem familiarity and documentation, add Anthropic for use cases that benefit from its strengths, and evaluate Gemini for high-volume cost optimization once you have volume to optimize.
Step 1: Streaming Chat API Route
// app/api/ai/chat/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { auth } from '@/auth';
import { checkRateLimit } from '@/lib/rate-limit';
import { checkCredits, deductCredits } from '@/lib/credits';
export const maxDuration = 60; // Vercel function timeout
export async function POST(req: Request) {
const session = await auth();
if (!session?.user) {
return new Response('Unauthorized', { status: 401 });
}
// Rate limit: 20 AI requests per minute per user
const { success: withinLimit } = await checkRateLimit(session.user.id, 'ai-chat', 20, '1m');
if (!withinLimit) {
return new Response('Rate limit exceeded. Please wait before sending another message.', {
status: 429,
});
}
// Credit check before calling the LLM
const hasCredits = await checkCredits(session.user.id, 1);
if (!hasCredits) {
return new Response(
JSON.stringify({ error: 'insufficient_credits', message: 'You\'ve used all your AI credits. Upgrade to continue.' }),
{ status: 402, headers: { 'Content-Type': 'application/json' } }
);
}
const { messages } = await req.json();
const result = streamText({
model: openai('gpt-4o-mini'), // Use cheaper model by default
messages,
system: `You are a helpful assistant for ${process.env.NEXT_PUBLIC_APP_NAME}.
Be concise and accurate. If you don't know something, say so.`,
maxTokens: 1024,
onFinish: async ({ usage, text }) => {
// Track token usage per user
await Promise.all([
db.aiUsage.create({
data: {
userId: session.user.id,
model: 'gpt-4o-mini',
promptTokens: usage.promptTokens,
completionTokens: usage.completionTokens,
totalTokens: usage.totalTokens,
estimatedCostUsd: (usage.promptTokens * 0.00000015) + (usage.completionTokens * 0.0000006),
},
}),
deductCredits(session.user.id, 1),
]);
},
});
return result.toDataStreamResponse();
}
Step 2: Frontend Chat UI
// components/ai/chat.tsx
'use client';
import { useChat } from 'ai/react';
import { useState, useRef, useEffect } from 'react';
import { toast } from 'sonner';
interface ChatProps {
systemPrompt?: string;
placeholder?: string;
creditsRemaining: number;
}
export function AiChat({ placeholder = 'Ask anything...', creditsRemaining }: ChatProps) {
const [credits, setCredits] = useState(creditsRemaining);
const {
messages,
input,
handleInputChange,
handleSubmit,
isLoading,
stop,
error,
setMessages,
} = useChat({
api: '/api/ai/chat',
onFinish: () => {
setCredits((c) => Math.max(0, c - 1));
},
onError: (error) => {
const body = JSON.parse(error.message || '{}');
if (body.error === 'insufficient_credits') {
toast.error('Out of AI credits', {
description: 'Upgrade your plan to continue.',
action: { label: 'Upgrade', onClick: () => window.location.href = '/pricing' },
});
} else if (error.message.includes('429')) {
toast.error('Slow down — you\'re sending messages too fast.');
} else {
toast.error('Something went wrong. Please try again.');
}
},
});
const bottomRef = useRef<HTMLDivElement>(null);
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
}, [messages]);
return (
<div className="flex flex-col h-full">
{/* Credits indicator */}
<div className="flex items-center justify-between px-4 py-2 border-b text-sm text-gray-500">
<span>AI Assistant</span>
<span>{credits} credit{credits !== 1 ? 's' : ''} remaining</span>
</div>
{/* Messages */}
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.length === 0 && (
<div className="text-center text-gray-400 mt-20">
<p className="text-lg font-medium">How can I help?</p>
</div>
)}
{messages.map((m) => (
<div key={m.id} className={`flex ${m.role === 'user' ? 'justify-end' : 'justify-start'}`}>
<div
className={`max-w-[80%] rounded-2xl px-4 py-2 ${
m.role === 'user'
? 'bg-blue-600 text-white'
: 'bg-gray-100 text-gray-900'
}`}
>
<p className="whitespace-pre-wrap text-sm">{m.content}</p>
</div>
</div>
))}
{isLoading && (
<div className="flex justify-start">
<div className="bg-gray-100 rounded-2xl px-4 py-2">
<span className="flex gap-1">
<span className="w-2 h-2 bg-gray-400 rounded-full animate-bounce [animation-delay:0ms]" />
<span className="w-2 h-2 bg-gray-400 rounded-full animate-bounce [animation-delay:150ms]" />
<span className="w-2 h-2 bg-gray-400 rounded-full animate-bounce [animation-delay:300ms]" />
</span>
</div>
</div>
)}
<div ref={bottomRef} />
</div>
{/* Input */}
<form onSubmit={handleSubmit} className="p-4 border-t flex gap-2">
<input
value={input}
onChange={handleInputChange}
placeholder={credits === 0 ? 'Upgrade to continue...' : placeholder}
disabled={credits === 0 || isLoading}
className="flex-1 rounded-lg border px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:opacity-50"
/>
{isLoading ? (
<button
type="button"
onClick={stop}
className="px-4 py-2 bg-red-500 text-white rounded-lg text-sm hover:bg-red-600"
>
Stop
</button>
) : (
<button
type="submit"
disabled={!input.trim() || credits === 0}
className="px-4 py-2 bg-blue-600 text-white rounded-lg text-sm hover:bg-blue-700 disabled:opacity-50"
>
Send
</button>
)}
</form>
</div>
);
}
Step 3: Credits System
// lib/credits.ts
import { db } from '@/lib/db';
const PLAN_CREDITS: Record<string, number> = {
free: 10, // 10 AI messages per month
pro: 500, // 500 AI messages per month
team: 2000,
};
export async function checkCredits(userId: string, required = 1): Promise<boolean> {
const user = await db.user.findUnique({
where: { id: userId },
select: { plan: true, aiCreditsUsed: true, aiCreditsResetAt: true },
});
if (!user) return false;
// Reset monthly credits
const now = new Date();
if (!user.aiCreditsResetAt || user.aiCreditsResetAt < now) {
await db.user.update({
where: { id: userId },
data: {
aiCreditsUsed: 0,
aiCreditsResetAt: new Date(now.getFullYear(), now.getMonth() + 1, 1),
},
});
return true; // Fresh credits
}
const limit = PLAN_CREDITS[user.plan ?? 'free'] ?? 10;
return (user.aiCreditsUsed ?? 0) + required <= limit;
}
export async function deductCredits(userId: string, amount = 1): Promise<void> {
await db.user.update({
where: { id: userId },
data: { aiCreditsUsed: { increment: amount } },
});
}
Step 4: Rate Limiting AI Endpoints
// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const redis = Redis.fromEnv();
const limiters = {
'ai-chat': new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(20, '1m'), // 20 messages per minute
prefix: 'rl:ai-chat',
}),
'ai-chat-daily': new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(200, '24h'), // 200 per day as abuse ceiling
prefix: 'rl:ai-daily',
}),
};
export async function checkRateLimit(
userId: string,
limiterName: keyof typeof limiters,
_limit?: number,
_window?: string,
) {
const limiter = limiters[limiterName];
return limiter.limit(userId);
}
Step 5: Token Usage Dashboard
// app/dashboard/ai-usage/page.tsx
import { auth } from '@/auth';
import { db } from '@/lib/db';
export default async function AiUsagePage() {
const session = await auth();
const startOfMonth = new Date(new Date().getFullYear(), new Date().getMonth(), 1);
const usage = await db.aiUsage.aggregate({
where: { userId: session!.user.id, createdAt: { gte: startOfMonth } },
_sum: { totalTokens: true, estimatedCostUsd: true },
_count: { id: true },
});
const recentHistory = await db.aiUsage.findMany({
where: { userId: session!.user.id },
orderBy: { createdAt: 'desc' },
take: 10,
select: { model: true, totalTokens: true, estimatedCostUsd: true, createdAt: true },
});
return (
<div className="space-y-6">
<h1 className="text-2xl font-bold">AI Usage</h1>
<div className="grid grid-cols-3 gap-4">
<div className="rounded-lg border p-4">
<p className="text-sm text-gray-500">Messages This Month</p>
<p className="text-3xl font-bold">{usage._count.id}</p>
</div>
<div className="rounded-lg border p-4">
<p className="text-sm text-gray-500">Tokens Used</p>
<p className="text-3xl font-bold">{(usage._sum.totalTokens ?? 0).toLocaleString()}</p>
</div>
<div className="rounded-lg border p-4">
<p className="text-sm text-gray-500">Est. Cost</p>
<p className="text-3xl font-bold">${(usage._sum.estimatedCostUsd ?? 0).toFixed(4)}</p>
</div>
</div>
{/* Recent history table */}
<div className="rounded-lg border">
<table className="w-full text-sm">
<thead>
<tr className="border-b bg-gray-50">
<th className="p-3 text-left">Model</th>
<th className="p-3 text-left">Tokens</th>
<th className="p-3 text-left">Cost</th>
<th className="p-3 text-left">Time</th>
</tr>
</thead>
<tbody>
{recentHistory.map((row, i) => (
<tr key={i} className="border-b last:border-0">
<td className="p-3 font-mono">{row.model}</td>
<td className="p-3">{row.totalTokens.toLocaleString()}</td>
<td className="p-3">${row.estimatedCostUsd.toFixed(5)}</td>
<td className="p-3 text-gray-500">{row.createdAt.toLocaleString()}</td>
</tr>
))}
</tbody>
</table>
</div>
</div>
);
}
Prisma Schema Additions
// Add to your existing schema:
model AiUsage {
id String @id @default(cuid())
userId String
model String
promptTokens Int
completionTokens Int
totalTokens Int
estimatedCostUsd Decimal @db.Decimal(10, 8)
createdAt DateTime @default(now())
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
@@index([userId, createdAt])
}
// Add to User model:
// aiCreditsUsed Int @default(0)
// aiCreditsResetAt DateTime?
Streaming UX: What Makes or Breaks AI Features
Streaming text output — character by character as the model generates it — is one of the most impactful UX decisions for AI products. Users who see a blank screen for 3–5 seconds while a response generates have a dramatically worse experience than users who see text appearing immediately. The Vercel AI SDK's streamText with toDataStreamResponse() handles the streaming infrastructure; the UX patterns that matter on top of it:
Show a loading state immediately: Before the first token arrives, show "thinking" or a typing indicator. The delay between submitting a message and receiving the first token is typically 0.5–2 seconds for fast models, longer for slow models or complex prompts. Users who don't see any feedback in that window often think their request failed.
Render markdown incrementally: AI responses often include markdown (headers, bullets, code blocks). Rendering raw markdown strings with a markdown renderer causes flickering as incomplete markdown syntax parses differently mid-stream. Libraries like react-markdown handle this gracefully with the streaming content, or you can defer markdown rendering until the stream completes and show plain text during streaming.
Interruption handling: The "Stop" button (shown in the chat component above) requires the Vercel AI SDK's stop() function from useChat. This sends an abort signal to the server, which should cancel the in-progress LLM API call. Cancellation is not guaranteed (LLM providers handle mid-stream cancellation differently), but stopping the client from processing further tokens is immediate.
Error recovery: LLM APIs fail. The model returns a 429 (rate limit), 503 (service unavailable), or a connection timeout. Design your error states to be actionable: "The AI is temporarily busy — please try again in a moment" with a retry button is better than "Something went wrong." Track error frequency in your analytics — persistent errors signal provider issues or rate limit configuration problems.
Prompt Management
System prompts are configuration that belongs in your codebase, not hardcoded strings scattered through API routes. As your AI features grow, centralize prompt management:
// lib/prompts.ts
export const PROMPTS = {
chat: (appName: string) => `You are a helpful assistant for ${appName}.
Be concise and accurate. Respond in plain text unless the user explicitly requests formatted output.
If you don't know something, say so rather than guessing.`,
summarize: (content: string) => `Summarize the following content in 2-3 bullet points:
${content}
Output as a JSON array of strings.`,
extract: (schema: string) => `Extract structured data from the user's input following this schema:
${schema}
Respond with valid JSON only, no explanation.`,
} as const;
Centralized prompts make it easier to: A/B test different phrasings, version-control prompt changes (the prompt is in your git history), apply consistent tone across features, and audit what instructions you're giving the model.
AI Feature Rollout Strategy
Don't ship AI features to all users simultaneously. Use feature flags to control rollout, collect data on AI usage and quality before scaling, and manage LLM cost exposure during early access.
The rollout sequence that works for most AI SaaS products:
- Internal alpha: The team uses the feature daily for 2 weeks. Find the bugs, calibrate the prompts, establish cost baselines.
- Closed beta with 5–10%: Recruit engaged users. Collect qualitative feedback on accuracy and usefulness. Monitor credit consumption and error rates.
- Gradual rollout (10% → 50% → 100%): Gate each expansion on key metrics: error rate < 2%, cost per user within budget, no increase in churn in the cohort using AI features.
This sequence prevents the failure mode where an AI feature looks great in testing but has a subtle quality issue that only appears at scale (a prompt that works for simple inputs but breaks on complex ones, for example).
For feature flags implementation, see how to add feature flags to any SaaS boilerplate.
Context Windows and Conversation History
Multi-turn conversations accumulate context. After 20 messages, you're passing the full conversation history to every API call — this increases latency and cost. Manage context actively:
Token budget: Keep conversation history under 70% of the model's context window. For GPT-4o-mini (128K tokens), trim conversations that exceed ~90K tokens.
Summarization: When conversations get long, summarize the oldest messages and replace them with a brief system note: "Earlier in this conversation, the user was working on [topic] and asked about [x]." This preserves semantic continuity without token cost.
Per-session vs persistent context: Decide whether conversation history persists across sessions. A writing assistant probably wants persistent history (the user comes back to the same document). A general-purpose chatbot might not. Store conversation history in your AiUsage or a separate AiMessage table keyed by (userId, conversationId).
Testing AI Features
AI features require different testing approaches than standard CRUD endpoints. The output is non-deterministic — the same input can produce different outputs on repeated calls — which makes traditional assertion-based testing difficult.
Smoke tests with fixed seeds: Test that your API route returns a 200 with a streaming response, that credits are deducted after completion, and that rate limits fire correctly after N requests. These infrastructure behaviors are deterministic and fully testable.
Evaluation sets for quality: Create a set of representative input/expected output pairs and manually review responses periodically rather than asserting exact matches. Tools like Braintrust and LangSmith track response quality over time and flag regressions when a prompt change degrades output quality across your evaluation set.
Mocking in unit tests: For components that consume AI responses, mock the useChat hook return value in unit tests rather than making real API calls. Test that the UI handles loading, error, and rate-limit states correctly — these are deterministic and important to get right.
Related Resources
For rate limiting specifically applied to AI endpoints — per-minute limits, daily limits, and monthly cost caps per user plan — rate limiting and abuse prevention for SaaS covers the full Upstash implementation. For a broader view of which boilerplates include AI infrastructure and how to choose one as a foundation for an AI SaaS product, best boilerplates for AI SaaS products covers the provider landscape and billing patterns. For the React Server Components streaming patterns that interact with streamText responses, React Server Components in boilerplates covers the Suspense integration.
Methodology
Implementation patterns based on Vercel AI SDK documentation (v4.x), OpenAI and Anthropic API documentation, and community patterns from the Next.js and AI builders Discord communities as of Q1 2026. LLM pricing from official provider pricing pages. Credit system patterns derived from ShipFast's implementation and community discussions on IndieHackers.
Find boilerplates with AI features pre-built at StarterPick.