Guide

Integrate Claude into a Web App

By Spencer Hill, Founder · Last updated: May 24, 2026

What you'll learn

This guide walks through every layer of a production-grade Claude integration in a TypeScript web app: secure key handling, prompt structure, streaming responses to the browser over Server-Sent Events, tool use loops, error recovery, and cost observability. By the end you'll have a working pattern you can drop into a Next.js, Remix, or Hono app without having to reverse-engineer the SDK.

Prerequisites

  • Node.js 20+ and a TypeScript-enabled web framework (Next.js App Router assumed)
  • An Anthropic API key with billing enabled
  • Familiarity with async iterators and Server-Sent Events
  • An observability sink (Datadog, Axiom, OpenTelemetry collector, or equivalent)

Steps

  1. 1

    Provision API access and store the key securely

    Create an Anthropic console account, generate an API key, and store it as a server-side environment variable. Never expose the key in client bundles — all Claude calls must originate from a server route or edge function.

  2. 2

    Install the SDK and create a typed client

    Install @anthropic-ai/sdk and instantiate it once per process. Keep the client in a shared module so connection pooling and instrumentation are consistent across routes.

    // lib/anthropic.ts
    import Anthropic from "@anthropic-ai/sdk";
    
    export const anthropic = new Anthropic({
      apiKey: process.env.ANTHROPIC_API_KEY!,
      defaultHeaders: { "anthropic-beta": "prompt-caching-2024-07-31" },
    });
  3. 3

    Design the prompt with system, user, and assistant roles

    Use the system parameter for stable instructions and personality. Pass conversation history as ordered messages. Keep system prompts short and concrete — long system prompts hurt latency and quality.

  4. 4

    Stream responses to the browser

    Use stream: true and pipe the SSE chunks back to the client through a server route. Streaming dramatically improves perceived latency, and Claude's content_block_delta events map cleanly onto incremental UI rendering.

    // app/api/chat/route.ts
    import { anthropic } from "@/lib/anthropic";
    
    export async function POST(req: Request) {
      const { messages } = await req.json();
    
      const stream = await anthropic.messages.stream({
        model: "claude-opus-4-7",
        max_tokens: 1024,
        system: "You are a concise assistant.",
        messages,
      });
    
      const encoder = new TextEncoder();
      const body = new ReadableStream({
        async start(controller) {
          for await (const event of stream) {
            if (event.type === "content_block_delta" &&
                event.delta.type === "text_delta") {
              controller.enqueue(encoder.encode(event.delta.text));
            }
          }
          controller.close();
        },
      });
    
      return new Response(body, {
        headers: { "Content-Type": "text/plain; charset=utf-8" },
      });
    }
  5. 5

    Add tool use for actions the model needs to take

    Define tools with JSON Schema. When Claude returns a tool_use block, execute the tool server-side, append the tool_result to the message history, and loop until the model produces a final text response.

    const tools = [{
      name: "get_order_status",
      description: "Look up the current status of a customer order.",
      input_schema: {
        type: "object",
        properties: { orderId: { type: "string" } },
        required: ["orderId"],
      },
    }];
    
    async function runAgent(messages) {
      while (true) {
        const res = await anthropic.messages.create({
          model: "claude-opus-4-7",
          max_tokens: 1024,
          tools,
          messages,
        });
    
        if (res.stop_reason !== "tool_use") return res;
    
        const toolUse = res.content.find(b => b.type === "tool_use");
        const result = await runTool(toolUse.name, toolUse.input);
    
        messages.push({ role: "assistant", content: res.content });
        messages.push({
          role: "user",
          content: [{
            type: "tool_result",
            tool_use_id: toolUse.id,
            content: JSON.stringify(result),
          }],
        });
      }
    }
  6. 6

    Handle errors, rate limits, and partial failures

    Catch 429s with exponential backoff, surface 400 validation errors as user-facing messages, and persist partial assistant turns so a network drop mid-stream does not corrupt history.

  7. 7

    Instrument cost and latency from day one

    Log input_tokens, output_tokens, model, and request duration to your observability stack. Alert on per-user token spend and tail latency. Enable prompt caching for any repeated system prompt or long context.

Common pitfalls

  • Shipping the API key to the client. Anthropic does not support a public/browser key — any client-side call must proxy through your server.
  • Ignoring the tool_use loop. A single call is rarely enough. Build a bounded while-loop with a max-step guard so you never spin forever on a model that keeps calling tools.
  • Forgetting prompt caching. Long system prompts repeated per request are the #1 source of avoidable spend. Mark the static prefix as cacheable and you'll see costs drop by 60-90% for chat workloads.
  • Trusting model output blindly in tools. Validate tool inputs with Zod or the schema layer of your choice before touching production systems.
  • No max_tokens cap on user-facing turns. Always set a ceiling. A runaway 4,000-token reply on every chat message compounds quickly.

Next steps

If you want a partner to take this from prototype to production, we can help end-to-end: