Best Boilerplates for Building an AI Wrapper SaaS 2026

TL;DR

An AI wrapper SaaS has different requirements than a standard SaaS — you need streaming responses, token usage tracking, per-user rate limiting, and cost management on top of your own billing. Most general SaaS boilerplates don't ship with these. In 2026, the best options are: Shipixen (AI-native, ships with Vercel AI SDK pre-configured), T3 Stack + AI SDK (most flexible, build to your exact needs), and adapted versions of ShipFast/Makerkit (add AI layer yourself). Here's how to evaluate and set up each.

Key Takeaways

AI wrapper requirements: streaming, token metering, rate limiting, model switching, cost passthrough
Vercel AI SDK: the de facto standard for streaming in Next.js AI apps — use it in any boilerplate
T3 Stack + AI SDK: most flexible foundation — build exactly what you need
Shipixen: AI-native starter with streaming pre-built, good for rapid prototyping
ShipFast + AI layer: large community, add ai package on top — fastest path if you already own ShipFast
Token metering: use Stripe Meters (usage-based billing) — no boilerplate ships this pre-built

What Makes AI Wrapper SaaS Different

A standard SaaS boilerplate gives you auth + billing + dashboard. An AI wrapper needs:

Standard SaaS:
  Auth → Dashboard → Features → Billing (flat subscription)

AI Wrapper SaaS:
  Auth → Dashboard → AI Features (streaming) → Usage Tracking
    → Rate Limiting (per-user)
    → Token Metering → Cost Management
    → Billing (usage-based OR credit-based)
    → Prompt Management / Versioning

The unique technical requirements:

Requirement	Why It Matters	Typical Solution
Streaming responses	LLM responses are slow — stream for UX	Vercel AI SDK `useChat` / `useCompletion`
Token tracking	API costs scale with usage	Count tokens per request, store in DB
Rate limiting	Prevent abuse / cost overruns	Redis + sliding window (Upstash)
Model switching	GPT-4 vs Claude vs Gemini	Abstraction layer via AI SDK
Prompt management	Version prompts, A/B test	DB-stored prompts or separate config
Cost passthrough	Charge users for AI usage	Stripe Meters or credit system
Abort/cancel	Users stop mid-generation	AbortController in streaming handler

The Core: Vercel AI SDK

Regardless of which boilerplate you pick, the Vercel AI SDK (ai package) is the foundation for all AI interactions:

npm install ai @ai-sdk/openai @ai-sdk/anthropic

// app/api/chat/route.ts — streaming chat endpoint (works in any boilerplate):
import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { streamText } from 'ai';
import { auth } from '@/lib/auth';
import { checkRateLimit } from '@/lib/rate-limit';
import { trackTokenUsage } from '@/lib/usage';

export async function POST(req: Request) {
  const session = await auth();
  if (!session) return new Response('Unauthorized', { status: 401 });

  // Rate limit: 20 requests/hour for free, 200 for pro:
  const { success, remaining } = await checkRateLimit(session.user.id, session.user.plan);
  if (!success) return new Response('Rate limit exceeded', { status: 429 });

  const { messages, model = 'gpt-4o-mini' } = await req.json();

  // Model routing:
  const modelProvider = model.startsWith('claude')
    ? anthropic(model)
    : openai(model);

  const result = await streamText({
    model: modelProvider,
    messages,
    system: 'You are a helpful assistant.',
    onFinish: async ({ usage }) => {
      // Track token usage after generation completes:
      await trackTokenUsage({
        userId: session.user.id,
        model,
        inputTokens: usage.promptTokens,
        outputTokens: usage.completionTokens,
        totalTokens: usage.totalTokens,
      });
    },
  });

  return result.toDataStreamResponse();
}

// app/chat/page.tsx — streaming UI (any boilerplate):
'use client';
import { useChat } from 'ai/react';

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, stop } = useChat({
    api: '/api/chat',
    body: { model: 'gpt-4o-mini' },
    onError: (err) => console.error('Chat error:', err),
  });

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4">
        {messages.map((m) => (
          <div key={m.id} className={m.role === 'user' ? 'text-right' : 'text-left'}>
            <span className="inline-block p-3 rounded-lg bg-muted max-w-[80%]">
              {m.content}
            </span>
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit} className="flex gap-2 mt-4">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Type a message..."
          className="flex-1 rounded border p-2"
        />
        {isLoading
          ? <button type="button" onClick={stop}>Stop</button>
          : <button type="submit">Send</button>
        }
      </form>
    </div>
  );
}

This pattern works with any boilerplate — add it to ShipFast, Makerkit, T3 Stack, etc.

Option 1: T3 Stack — Most Flexible

Best for: developers who want full control and are comfortable assembling their own AI layer.

npm create t3-app@latest my-ai-saas
# Select: Next.js, TypeScript, Prisma, tRPC, Tailwind
npm install ai @ai-sdk/openai @ai-sdk/anthropic @upstash/ratelimit @upstash/redis

Database schema for AI usage tracking:

// prisma/schema.prisma additions:
model Conversation {
  id        String    @id @default(cuid())
  userId    String
  user      User      @relation(fields: [userId], references: [id])
  title     String?
  messages  Message[]
  createdAt DateTime  @default(now())
  updatedAt DateTime  @updatedAt
}

model Message {
  id             String       @id @default(cuid())
  conversationId String
  conversation   Conversation @relation(fields: [conversationId], references: [id])
  role           String       // 'user' | 'assistant' | 'system'
  content        String       @db.Text
  model          String?      // 'gpt-4o-mini', 'claude-3-haiku', etc.
  inputTokens    Int          @default(0)
  outputTokens   Int          @default(0)
  createdAt      DateTime     @default(now())
}

model UsageSummary {
  id           String   @id @default(cuid())
  userId       String
  user         User     @relation(fields: [userId], references: [id])
  month        String   // '2026-03'
  totalTokens  Int      @default(0)
  totalCost    Float    @default(0) // in USD
  updatedAt    DateTime @updatedAt

  @@unique([userId, month])
}

Rate limiting with Upstash Redis:

// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const redis = new Redis({
  url: process.env.UPSTASH_REDIS_REST_URL!,
  token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});

const limits = {
  free: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(20, '1 h'), // 20 requests/hour
  }),
  pro: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(200, '1 h'), // 200/hour
  }),
  enterprise: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(2000, '1 h'),
  }),
};

export async function checkRateLimit(userId: string, plan: string) {
  const limiter = limits[plan as keyof typeof limits] ?? limits.free;
  return limiter.limit(userId);
}

Token cost tracking:

// lib/usage.ts
const TOKEN_COSTS = {
  'gpt-4o': { input: 0.000005, output: 0.000015 },       // per token
  'gpt-4o-mini': { input: 0.00000015, output: 0.0000006 },
  'claude-3-5-sonnet': { input: 0.000003, output: 0.000015 },
  'claude-3-haiku': { input: 0.00000025, output: 0.00000125 },
  'gemini-1.5-pro': { input: 0.00000125, output: 0.000005 },
} as const;

export async function trackTokenUsage({
  userId, model, inputTokens, outputTokens, totalTokens,
}: {
  userId: string;
  model: string;
  inputTokens: number;
  outputTokens: number;
  totalTokens: number;
}) {
  const costs = TOKEN_COSTS[model as keyof typeof TOKEN_COSTS];
  const cost = costs
    ? inputTokens * costs.input + outputTokens * costs.output
    : 0;

  const month = new Date().toISOString().slice(0, 7); // '2026-03'

  await db.usageSummary.upsert({
    where: { userId_month: { userId, month } },
    update: {
      totalTokens: { increment: totalTokens },
      totalCost: { increment: cost },
    },
    create: { userId, month, totalTokens, totalCost: cost },
  });
}

T3 Stack is right for AI SaaS if you need custom billing logic, multi-model support, or are building something that doesn't fit a template.

Option 2: ShipFast + AI Layer

Best for: those who already own ShipFast and want to add AI features fast.

ShipFast doesn't ship with AI pre-built, but adding the Vercel AI SDK on top takes ~2 hours:

# In your ShipFast project:
npm install ai @ai-sdk/openai

// Add to ShipFast's existing API structure:
// app/api/ai/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
import { getServerSession } from 'next-auth'; // or Supabase auth
import { authOptions } from '@/libs/next-auth';

export async function POST(req: Request) {
  const session = await getServerSession(authOptions);

  // Reuse ShipFast's existing auth check:
  if (!session?.user) {
    return new Response('Unauthorized', { status: 401 });
  }

  // Use ShipFast's plan detection:
  const isPro = session.user.priceId === process.env.STRIPE_PRO_PRICE_ID;
  if (!isPro) {
    return new Response('Upgrade to Pro for AI features', { status: 403 });
  }

  const { messages } = await req.json();

  const result = await streamText({
    model: openai('gpt-4o-mini'),
    messages,
  });

  return result.toDataStreamResponse();
}

ShipFast + AI path makes sense if:

You already own ShipFast (no additional boilerplate cost)
Your AI features are gated behind a paid plan (ShipFast's plan check is simple)
You don't need per-token billing (flat-rate subscription is fine)

Option 3: Makerkit — Plugin-Based AI Integration

Makerkit's plugin system is well-suited for AI features:

// Makerkit plugin pattern for AI:
// packages/plugins/ai-assistant/src/api/chat.ts
import { createRouteHandlerClient } from '@supabase/auth-helpers-nextjs';
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

export async function chatHandler(req: Request) {
  const supabase = createRouteHandlerClient({ cookies });
  const { data: { user } } = await supabase.auth.getUser();

  if (!user) return new Response('Unauthorized', { status: 401 });

  // Makerkit's org context:
  const { organizationId, messages } = await req.json();

  // Check org's AI quota:
  const quota = await getOrganizationAIQuota(organizationId);
  if (quota.used >= quota.limit) {
    return new Response('AI quota exceeded for this organization', { status: 429 });
  }

  const result = await streamText({
    model: openai('gpt-4o-mini'),
    messages,
    onFinish: async ({ usage }) => {
      await incrementOrganizationAIUsage(organizationId, usage.totalTokens);
    },
  });

  return result.toDataStreamResponse();
}

Makerkit shines for AI SaaS if:

You're building B2B — organizations share an AI quota
You want the AI feature as an add-on to a full SaaS (not AI as the core product)
You'll use the team management and billing plugins alongside it

Credit-Based vs Usage-Based Billing

Two billing models work for AI wrappers:

// Model 1: Credit-based (buy credits, spend on AI usage)
// Simpler UX — users buy packs, credits deduct per request

// Credit purchase:
const session = await stripe.checkout.sessions.create({
  mode: 'payment',       // One-time payment, not subscription
  line_items: [{ price: 'price_1000_credits', quantity: 1 }],
  // ...
});

// Credit deduction:
await db.user.update({
  where: { id: userId },
  data: { credits: { decrement: tokensUsed } },
});

// Guard before each AI call:
const user = await db.user.findUnique({ where: { id: userId } });
if (user.credits <= 0) throw new Error('Insufficient credits');

// Model 2: Stripe Meters (usage-based subscription)
// More complex setup, but users pay for exactly what they use

// Record usage event:
await stripe.billing.meterEvents.create({
  event_name: 'ai_tokens',
  payload: {
    stripe_customer_id: customerId,
    value: String(tokensUsed),
  },
});

// Create subscription with usage price:
await stripe.subscriptions.create({
  customer: customerId,
  items: [{ price: 'price_per_1k_tokens' }],
});

Which billing model to use:

	Credit-Based	Usage-Based (Stripe Meters)
User experience	Predictable (buy credits, see balance)	Pay for what you use
Revenue predictability	Higher (bulk purchase)	Lower (variable)
Setup complexity	Lower	Higher
Best for	Consumer AI tools, indie hackers	B2B with high usage variance

Production Checklist for AI SaaS

Before launching an AI wrapper:

[ ] Streaming — responses stream, not wait for full generation
[ ] Error handling — API timeouts, rate limit errors, model failures
[ ] Token limits — enforce per-request max tokens (prevent cost bombs)
[ ] Rate limiting — per-user hourly/daily limits
[ ] Cost monitoring — alert when daily spend exceeds threshold
[ ] Prompt injection prevention — sanitize user input
[ ] PII handling — don't log PII in prompt logs
[ ] Fallback model — if GPT-4o fails, fall back to GPT-4o-mini
[ ] Abort/cancel — users can stop a generation
[ ] Content moderation — if user-facing, run through moderation API

// Content moderation (OpenAI):
const moderation = await openai.moderations.create({
  input: userMessage,
});
if (moderation.results[0].flagged) {
  return new Response('Message flagged by content policy', { status: 400 });
}

Prompt Management and Version Control

Production AI SaaS products quickly discover that managing prompts is a first-class engineering problem, not an afterthought. Prompts change frequently — you discover that a slight wording change improves output quality, or a new model version requires different prompt structure. Without version control for prompts, you have no way to roll back to the last known-good prompt when a change causes a regression.

The simplest approach is storing prompts in your database with version tracking. A prompt_versions table with id, name, content, model, version_number, created_by, and is_active columns gives you the audit trail and rollback capability. The active prompt is fetched at runtime rather than hardcoded in your source file, enabling prompt updates without code deployments.

More sophisticated prompt management includes A/B testing: send 10% of requests to the new prompt version while 90% use the current version, then measure output quality metrics to decide whether to roll out the change. This requires instrumenting your AI calls with which prompt version was used, capturing user feedback (thumbs up/down), and tracking downstream metrics (task completion, user retention after using the AI feature).

External prompt management tools like LangSmith, Promptfoo, and Braintrust provide more structured workflows for teams that treat prompt engineering as a formal discipline. These tools handle version control, testing, evaluation, and A/B testing in a purpose-built UI. For early-stage indie products, database-stored prompts with manual versioning is sufficient. For teams with dedicated AI engineers and frequent prompt iteration, external tooling pays off.

Selecting the Right AI Model for Each Task

Model selection significantly impacts both quality and cost for AI wrapper SaaS. The approach of using the same model for every task — typically "GPT-4o for everything" — is common in early builds and expensive at scale.

The 2026 model landscape offers clear specialization: GPT-4o and Claude 3.5 Sonnet are the quality leaders for complex reasoning and nuanced text generation, but cost 20-50x more per token than smaller models. GPT-4o-mini and Claude Haiku handle simpler tasks — classification, summarization, structured extraction — at a fraction of the cost with comparable quality on those specific tasks.

The pattern that works: route tasks to models by complexity. Classification (is this customer sentiment positive, negative, or neutral?) goes to a small model. Complex analysis (summarize this legal document and identify the five key risk factors) goes to a large model. The routing logic is simple — a function that maps task type to model selection. The cost savings are significant: a SaaS spending $500/month on GPT-4o for all tasks might spend $80/month after routing simple tasks to GPT-4o-mini.

Implementing fallback chains is also important for production reliability. If your primary model (GPT-4o) returns a rate limit error or is temporarily unavailable, your application should automatically retry with a fallback model (Claude 3.5 Sonnet or even GPT-4o-mini for non-quality-critical paths). The Vercel AI SDK's provider abstraction makes this pattern straightforward — the model parameter becomes dynamic rather than hardcoded.

Evaluating AI Boilerplates Before Buying

Not all boilerplates marketed as "AI-ready" deliver equivalent value. The term is applied loosely to everything from boilerplates that include the Vercel AI SDK with a chat endpoint to those with full token metering, streaming, rate limiting, and multi-model support built in.

The signals of a genuinely AI-ready boilerplate: streaming responses implemented correctly (not polling for completion, which kills user experience on slow models), rate limiting per user rather than globally, token usage tracking with database storage, and at least one example AI feature that demonstrates the full stack. A boilerplate that lists "OpenAI integration" as a feature but implements it as a simple await openai.chat.completions.create() without streaming, rate limiting, or token tracking gives you a starting point but not a foundation.

Check the streaming implementation specifically. Correct streaming uses the Vercel AI SDK's streamText and returns a DataStreamResponse — the client receives chunks as the model generates them. Incorrect "streaming" fetches the full completion and returns it as a single response after waiting for the full output. The difference is visible in the user experience: real streaming shows text appearing word by word; fake streaming shows nothing for 5 seconds then all the text at once.

For boilerplates with token metering, check whether the metering is connected to billing. Token metering stored in a database is useful for monitoring but doesn't generate revenue. Token metering connected to Stripe Meters (usage-based billing) or a credit system (deducting from a purchased credit balance) is the actual revenue mechanism. Most boilerplates implement the former; few implement the latter.

The boilerplate and tool choices covered here represent the most actively maintained options in their category as of 2026. Evaluate each against your specific requirements: team expertise, deployment infrastructure, budget, and the features your product requires on day one versus those you can add incrementally. The best starting point is the one that lets your team ship the first version of your product fastest, with the least architectural debt.

Find AI-ready SaaS boilerplates at StarterPick.

Compare background job tools for AI workflows: Inngest vs BullMQ vs Trigger.dev for boilerplates 2026.

Find boilerplates with the best architecture for extending with AI features: Best SaaS boilerplates 2026.

See how to evaluate the base boilerplate before adding AI layers: Red flags in SaaS boilerplates 2026.

The SaaS Boilerplate Matrix (Free PDF)