Best Boilerplates for AI SaaS Products in 2026

TL;DR

AI SaaS products need infrastructure that standard boilerplates don't include. ShipFast, T3, and Supastarter handle auth, billing, and dashboard — but not LLM streaming, token metering, vector databases, or credit systems. In 2026, the standard pattern is: choose a standard SaaS boilerplate, add the Vercel AI SDK layer, and build the AI-specific infrastructure yourself. This guide covers the patterns.

Key Takeaways

No boilerplate ships production-ready AI features — ShipFast has a demo, others have nothing
Vercel AI SDK is the standard layer for streaming, multi-provider switching, and tool use
Token metering belongs in the onFinish callback after every completion
Credit system is simpler than Stripe Meters for most AI SaaS — check before, deduct after
RAG with pgvector eliminates the need for a separate vector database for most apps
Rate limiting AI endpoints is critical — they're 10-50x more expensive than standard API calls

AI SaaS Has Different Infrastructure Requirements

Standard SaaS boilerplates (ShipFast, Supastarter, T3) work for AI SaaS — but they're missing the AI-specific infrastructure that takes weeks to build:

LLM API integration with streaming and error handling
Token metering and per-user usage limits
Vector database for RAG and semantic search
Prompt management and versioning
AI credit system tied to Stripe billing

In 2026, developers are building this infrastructure layer on top of standard SaaS boilerplates. This article shows the patterns.

AI SaaS Architecture

User input
  ↓
Rate limit check + credit deduction (Redis)
  ↓
Context retrieval (pgvector RAG) [optional]
  ↓
LLM API call (OpenAI / Anthropic / Google)
  ↓
Token usage metering → update user credits
  ↓
Stream response to client (Vercel AI SDK)
  ↓
Store conversation history (PostgreSQL)

Choosing Your LLM Provider

The provider choice affects cost, capability, and latency. The four practical options in 2026:

OpenAI: Largest model family (GPT-4o, GPT-4o-mini, o1, o3). Best function calling and structured output reliability. Most developer documentation and community examples. GPT-4o-mini at $0.15/M input tokens is the budget-conscious default for most AI SaaS features. GPT-4o at $2.50/M input tokens for use cases where quality matters.

Anthropic Claude: Strong at long-context tasks (up to 200K tokens), code generation, and instruction following. Claude 3.5 Sonnet is competitive with GPT-4o on most benchmarks at similar pricing. Claude models are notably better at refusing to do harmful things correctly — they're less likely to refuse legitimate requests while still refusing genuinely harmful ones. This matters for consumer-facing AI products.

Google Gemini: Best multimodal capabilities (image, audio, video input) and longest context window (1M tokens for Gemini 1.5 Pro). Cheapest pricing at scale. Gemini Flash for high-volume, low-latency use cases.

Local/Self-hosted (Ollama): Zero API cost, but requires GPU infrastructure. Practical for enterprise deployments where data sovereignty requirements prohibit sending data to cloud APIs. Llama 3.3 70B and Mistral models provide good quality for most general-purpose tasks.

The Vercel AI SDK's provider-agnostic interface makes switching providers a two-line change. Start with OpenAI for ecosystem familiarity, and add Anthropic as a fallback or for specific use cases.

Best Starting Combinations

ShipFast + Vercel AI SDK

The most common 2026 stack for AI SaaS:

ShipFast handles: auth, Stripe credits/subscriptions, email, landing page
Vercel AI SDK handles: multi-provider LLM calls, streaming, structured output

// app/api/chat/route.ts
import { streamText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { checkAndDeductCredits } from '@/libs/credits';
import { getServerSession } from 'next-auth';

export async function POST(req: Request) {
  const session = await getServerSession();
  if (!session) return new Response('Unauthorized', { status: 401 });

  // Check credits before LLM call
  const hasCredits = await checkAndDeductCredits(session.user.id, 1);
  if (!hasCredits) return new Response('No credits', { status: 402 });

  const { messages } = await req.json();

  const result = streamText({
    model: anthropic('claude-3-5-sonnet-20241022'),
    messages,
    onFinish: async ({ usage }) => {
      // Update actual token usage after completion
      await recordTokenUsage(session.user.id, usage);
    },
  });

  return result.toDataStreamResponse();
}

T3 Stack + AI SDK + pgvector

For AI SaaS with RAG (search over user documents):

// packages/api/src/router/rag.ts
import { openai } from '@ai-sdk/openai';
import { embed, embedMany } from 'ai';
import { db } from '@acme/db';
import { documents, embeddings } from '@acme/db/schema';
import { cosineDistance, gt, desc } from 'drizzle-orm';
import { sql } from 'drizzle-orm';

// Store document embedding
export async function indexDocument(userId: string, content: string) {
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: content,
  });

  await db.insert(embeddings).values({
    userId,
    content,
    embedding,  // pgvector stores float[]
  });
}

// Retrieve relevant context
export async function findRelevantContent(userId: string, query: string) {
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: query,
  });

  const similarity = sql<number>`1 - (${cosineDistance(embeddings.embedding, embedding)})`;

  return db.select({ content: embeddings.content, similarity })
    .from(embeddings)
    .where(gt(similarity, 0.5))
    .orderBy(desc(similarity))
    .limit(5);
}

Token Billing Patterns

Credit System (Most Common)

// $0.01 = 1 credit. Bundle credits into Stripe products.
const PRICING = {
  starter: { credits: 1000, price: 10 },   // $10 = 1000 credits
  pro: { credits: 5000, price: 40 },        // $40 = 5000 credits
  scale: { credits: 25000, price: 150 },    // $150 = 25000 credits
};

// LLM costs in credits per 1K tokens
const LLM_CREDIT_COST = {
  'gpt-4o': { input: 0.5, output: 1.5 },         // per 1K tokens
  'claude-3-5-sonnet': { input: 0.3, output: 1.5 },
};

Subscription with Soft Limits

// Plans include monthly token budget, warn at 80%, block at 100%
const PLANS = {
  starter: { tokensPerMonth: 500_000, price: 29 },
  pro: { tokensPerMonth: 5_000_000, price: 99 },
};

Vector Databases for RAG

For AI SaaS products that need to search over user documents (chat-with-your-data, semantic search, knowledge bases), you need a vector store. The options in 2026:

pgvector (PostgreSQL extension): The default choice for 90% of AI SaaS apps. Adds vector similarity search directly to your existing PostgreSQL database. Zero additional infrastructure, zero additional cost. Supports cosine similarity, L2 distance, and inner product. Performance is excellent for databases under 1M vectors with proper indexing (USING ivfflat). Neon, Supabase, and Railway all support pgvector.

Pinecone: Purpose-built vector database. Better performance at very high vector counts (10M+), but adds infrastructure cost ($70/month minimum for production) and another service to manage. Only justified when you've outgrown pgvector — which most AI SaaS products never do.

Qdrant (self-hosted): Open source, high performance, deployable on Railway or Fly.io. Good choice if you want vector-native performance without Pinecone's pricing.

Start with pgvector. Migrate to Pinecone or Qdrant only when query performance becomes a bottleneck.

AI SaaS Pricing Models

The three pricing models that work for AI SaaS:

Credit-based: Users purchase credits (e.g., 1000 credits for $10), each AI operation costs a set number of credits. Simple to implement, easy for users to understand, no surprise bills. Works best when operations have predictable cost (each chat message = 1 credit). ShipFast's built-in credit system uses this approach.

Subscription with included usage: Monthly subscription ($29/month) includes a token budget (500K tokens). Over budget: pay-per-use or soft block. Works well for products where usage is relatively predictable — note-taking AI, writing assistants, customer support bots.

Pure consumption billing: Pay exactly for what you use, passed through with a margin. Most complex to implement (requires Stripe Meters or similar usage billing). Most appropriate for developer-facing AI APIs where usage varies enormously between customers. For consumer SaaS, the complexity rarely justifies the revenue optimization.

The recommendation for most AI SaaS: start with subscription + included credits. It's the most predictable for both you and your customers, and it converts better than pure credit purchases.

AI SaaS Launch Checklist

Before launching an AI product:

Rate limiting — Prevent users from consuming all credits in one burst
Error handling — LLM APIs are flaky; implement retry with exponential backoff
Streaming — Users expect character-by-character output, not wait-then-dump
Cost controls — Set monthly spend limits on your LLM provider account
Content moderation — Screen inputs for ToS violations (OpenAI Moderation API)
Fallback models — If primary model fails, fallback to alternative
Usage dashboard — Show users their credit balance and usage history

AI SaaS Boilerplate Comparison

Boilerplate	AI Chat	Token Tracking	Credits	RAG Support	Best For
ShipFast	✅ Demo	❌	❌	❌	Quick AI feature addition
T3 Stack	❌	❌	❌	✅ pgvector	Developer-facing AI
Supastarter	❌	❌	❌	❌	Full-featured SaaS base
Open SaaS	✅ Demo	❌	❌	❌	Cost-conscious start

Every boilerplate requires you to build the AI infrastructure layer. ShipFast's chat demo saves 2-3 hours; the rest saves nothing.

LLM API Reliability and Fallback Patterns

LLM APIs are not as reliable as standard web APIs. OpenAI, Anthropic, and Google all experience intermittent latency spikes, rate limiting at the provider level, and occasional outages. Production AI SaaS needs explicit handling for each.

Exponential backoff on 429 and 503: Provider rate limits (429) and service unavailability (503) are transient — retry after a delay. The Vercel AI SDK doesn't retry automatically; implement retry logic in your route handler:

import { streamText } from 'ai';

async function callWithRetry(params: Parameters<typeof streamText>[0], maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return streamText(params);
    } catch (err: unknown) {
      const error = err as { status?: number };
      if (error.status === 429 || error.status === 503) {
        if (attempt < maxRetries - 1) {
          await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
          continue;
        }
      }
      throw err;
    }
  }
}

Provider fallback: If your primary provider is down, switch to a backup:

import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';

const PRIMARY = openai('gpt-4o-mini');
const FALLBACK = anthropic('claude-haiku-4-5-20251001');

// Try primary, fall back on error
const model = await tryProvider(PRIMARY) ?? FALLBACK;

Provider fallback is especially valuable for streaming endpoints where a 30-second timeout is a visible UX failure. A user who sees "something went wrong" after 30 seconds of waiting is more likely to churn than one who gets a response from the fallback model in 3 seconds.

Context Management for Multi-Turn Conversations

AI chat products need to maintain conversation history so each response has context from prior turns. The naive approach — pass all messages to every API call — accumulates context until you hit the model's context window limit.

Token-aware truncation: Before each API call, measure the total token count of your messages array and truncate from the oldest end if it exceeds your target (typically 70–80% of the model's context window to leave room for the response):

import { countTokens } from '@anthropic-ai/tokenizer'; // or tiktoken for OpenAI

const MAX_CONTEXT_TOKENS = 8000; // leave 4K for response

function trimMessages(messages: Message[], maxTokens: number): Message[] {
  // Always keep system message + last user message
  const systemMsg = messages[0];
  const recentMsgs = messages.slice(1);

  while (countTokens(recentMsgs) > maxTokens && recentMsgs.length > 1) {
    recentMsgs.splice(0, 1); // Remove oldest non-system message
  }

  return [systemMsg, ...recentMsgs];
}

Conversation summarization: When conversations get long, replace the oldest N messages with a summary of what was discussed. Ask the LLM to summarize the dropped messages in 2-3 sentences and prepend it as a system message. This preserves semantic context (what the user was working on) while dropping the literal exchange.

Persistent conversation storage: Store conversation history in your database keyed by (userId, conversationId). Load the last N messages (or N tokens) when a user returns to a conversation. This requires AiMessage records in your schema alongside the AiUsage records.

AI Product Failure Modes to Plan For

The failure modes that hurt AI SaaS products most are not technical failures — they're product failures that technical decisions either prevent or enable.

Hallucination in high-stakes contexts: If your AI product is used for medical, legal, financial, or safety decisions, hallucinations are a serious liability. Add explicit disclaimers in the UI, tune your system prompt to acknowledge uncertainty, and consider RAG over authoritative sources rather than relying on the model's parametric knowledge.

Jailbreaking and ToS violations: Users will attempt to use your AI product to generate content that violates your terms of service or your provider's usage policies. OpenAI's Moderation API (free) classifies inputs for hate speech, violence, sexual content, and self-harm before they reach the model. Run every user input through moderation and decline to process flagged inputs.

Prompt injection: If your product includes user-supplied content in prompts (documents, web pages, database records), malicious content can include instructions that override your system prompt. Sanitize user-supplied content before including it in prompts, and structure prompts to clearly delineate user content from system instructions.

Cost runaway: A single user running an automated script against your AI endpoint can exhaust your LLM provider budget in minutes if you have no per-user cost controls. The cost cap pattern from Pattern 5 (monthly cost caps per plan) combined with per-minute rate limiting prevents this.

For the complete implementation of streaming chat, credits, rate limiting, and usage dashboard in a Next.js boilerplate, how to add AI features to any SaaS boilerplate covers each component step by step. For rate limiting AI endpoints specifically — protecting expensive LLM calls from abuse and controlling costs per user — rate limiting and abuse prevention for SaaS covers the Upstash patterns. For the Vercel AI SDK's multi-provider setup and streaming patterns in a React Server Component architecture, React Server Components in boilerplates covers the streaming Suspense integration.

Methodology

Boilerplate AI feature comparisons based on direct review of ShipFast, T3, Supastarter, Makerkit, and Open SaaS repositories as of Q1 2026. LLM provider pricing from official pricing pages. pgvector performance benchmarks from the pgvector GitHub repository and community benchmarks.

Building Responsibly

AI products have unique product responsibilities that standard SaaS products don't: the potential for generating harmful content, the risk of users relying on hallucinated information for real decisions, and the asymmetry between how the model appears to behave in testing versus how it behaves at scale with a diverse user base. Before launch, define your product's content policy explicitly — what topics the AI should and shouldn't engage with — and implement enforcement via system prompt constraints and input moderation. Review your moderation configuration quarterly as use patterns evolve.

AI SaaS in 2026

The infrastructure gap for AI SaaS boilerplates is narrowing. In 2024, no boilerplate included any AI infrastructure. In 2026, ShipFast includes a demo, and third-party starter templates with full credit systems are available. By 2027, expect most major boilerplates to ship with streaming chat, credit systems, and rate limiting as standard features. In the meantime, the patterns in this article are the industry standard implementation, and building them takes 2–3 days of focused work on top of any SaaS boilerplate.

The choice of LLM provider matters less than the choice of boilerplate architecture. A well-structured streaming chat route with proper credit tracking and rate limiting works with any provider — switching from GPT-4o to Claude 3.5 Sonnet is a two-line change in the Vercel AI SDK.

The boilerplate and tool choices covered here represent the most actively maintained options in their category as of 2026. Evaluate each against your specific requirements: team expertise, deployment infrastructure, budget, and the features your product requires on day one versus those you can add incrementally. The best starting point is the one that lets your team ship the first version of your product fastest, with the least architectural debt.

Compare AI SaaS boilerplates and standard starters on StarterPick.

The SaaS Boilerplate Matrix (Free PDF)