Guide
Integrate Claude into a Web App
By Spencer Hill, Founder · Last updated: May 24, 2026
What you'll learn
This guide walks through every layer of a production-grade Claude integration in a TypeScript web app: secure key handling, prompt structure, streaming responses to the browser over Server-Sent Events, tool use loops, error recovery, and cost observability. By the end you'll have a working pattern you can drop into a Next.js, Remix, or Hono app without having to reverse-engineer the SDK.
Prerequisites
- Node.js 20+ and a TypeScript-enabled web framework (Next.js App Router assumed)
- An Anthropic API key with billing enabled
- Familiarity with async iterators and Server-Sent Events
- An observability sink (Datadog, Axiom, OpenTelemetry collector, or equivalent)
Steps
- 1
Provision API access and store the key securely
Create an Anthropic console account, generate an API key, and store it as a server-side environment variable. Never expose the key in client bundles — all Claude calls must originate from a server route or edge function.
- 2
Install the SDK and create a typed client
Install @anthropic-ai/sdk and instantiate it once per process. Keep the client in a shared module so connection pooling and instrumentation are consistent across routes.
// lib/anthropic.ts import Anthropic from "@anthropic-ai/sdk"; export const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY!, defaultHeaders: { "anthropic-beta": "prompt-caching-2024-07-31" }, }); - 3
Design the prompt with system, user, and assistant roles
Use the system parameter for stable instructions and personality. Pass conversation history as ordered messages. Keep system prompts short and concrete — long system prompts hurt latency and quality.
- 4
Stream responses to the browser
Use stream: true and pipe the SSE chunks back to the client through a server route. Streaming dramatically improves perceived latency, and Claude's content_block_delta events map cleanly onto incremental UI rendering.
// app/api/chat/route.ts import { anthropic } from "@/lib/anthropic"; export async function POST(req: Request) { const { messages } = await req.json(); const stream = await anthropic.messages.stream({ model: "claude-opus-4-7", max_tokens: 1024, system: "You are a concise assistant.", messages, }); const encoder = new TextEncoder(); const body = new ReadableStream({ async start(controller) { for await (const event of stream) { if (event.type === "content_block_delta" && event.delta.type === "text_delta") { controller.enqueue(encoder.encode(event.delta.text)); } } controller.close(); }, }); return new Response(body, { headers: { "Content-Type": "text/plain; charset=utf-8" }, }); } - 5
Add tool use for actions the model needs to take
Define tools with JSON Schema. When Claude returns a tool_use block, execute the tool server-side, append the tool_result to the message history, and loop until the model produces a final text response.
const tools = [{ name: "get_order_status", description: "Look up the current status of a customer order.", input_schema: { type: "object", properties: { orderId: { type: "string" } }, required: ["orderId"], }, }]; async function runAgent(messages) { while (true) { const res = await anthropic.messages.create({ model: "claude-opus-4-7", max_tokens: 1024, tools, messages, }); if (res.stop_reason !== "tool_use") return res; const toolUse = res.content.find(b => b.type === "tool_use"); const result = await runTool(toolUse.name, toolUse.input); messages.push({ role: "assistant", content: res.content }); messages.push({ role: "user", content: [{ type: "tool_result", tool_use_id: toolUse.id, content: JSON.stringify(result), }], }); } } - 6
Handle errors, rate limits, and partial failures
Catch 429s with exponential backoff, surface 400 validation errors as user-facing messages, and persist partial assistant turns so a network drop mid-stream does not corrupt history.
- 7
Instrument cost and latency from day one
Log input_tokens, output_tokens, model, and request duration to your observability stack. Alert on per-user token spend and tail latency. Enable prompt caching for any repeated system prompt or long context.
Common pitfalls
- Shipping the API key to the client. Anthropic does not support a public/browser key — any client-side call must proxy through your server.
- Ignoring the tool_use loop. A single call is rarely enough. Build a bounded while-loop with a max-step guard so you never spin forever on a model that keeps calling tools.
- Forgetting prompt caching. Long system prompts repeated per request are the #1 source of avoidable spend. Mark the static prefix as cacheable and you'll see costs drop by 60-90% for chat workloads.
- Trusting model output blindly in tools. Validate tool inputs with Zod or the schema layer of your choice before touching production systems.
- No max_tokens cap on user-facing turns. Always set a ceiling. A runaway 4,000-token reply on every chat message compounds quickly.
Next steps
If you want a partner to take this from prototype to production, we can help end-to-end:
- AI & LLM integration services — production Claude, OpenAI, and agentic systems.
- Web development — Next.js, Remix, and Cloudflare-native apps.
- Get in touch to scope an integration.