Designing UI for Streaming AI Responses

November 22, 2025•7 min read

LLM responses stream over time with variable length and timing. A single prompt might return 50 tokens or 500 tokens. Response time varies from 200ms to 10 seconds depending on complexity. Content arrives in chunks as the model generates it.

Your UI can handle this variability gracefully by streaming content to users as it arrives.

TL;DR

Stream responses using Server-Sent Events to show progress immediately. Design state contracts between backend and frontend that handle thinking, streaming, completion, and errors. Use type-safe events to drive UI updates. Show confidence scores when available. Build fallback patterns for failures.

Streaming Variable-Length Responses

LLM responses arrive incrementally. Users can start reading while the model generates, which feels more responsive than waiting for completion.

The pattern: Stream events from your backend as processing happens.

Backend implementation (FastAPI):

python

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from anthropic import Anthropic
import json
 
app = FastAPI()
client = Anthropic()
 
@app.post("/api/resolve-ticket")
async def resolve_ticket(ticket_data: dict):
    async def generate():
        # Initial state
        yield f"data: {json.dumps({
            'type': 'status',
            'state': 'thinking'
        })}\n\n"
 
        # Stream LLM response
        with client.messages.stream(
            model="claude-sonnet-4-5-20250929",
            messages=[{
                "role": "user",
                "content": ticket_data["description"]
            }],
            max_tokens=1024
        ) as stream:
            for text in stream.text_stream:
                yield f"data: {json.dumps({
                    'type': 'content',
                    'text': text
                })}\n\n"
 
        # Completion state
        yield f"data: {json.dumps({
            'type': 'complete',
            'confidence': 0.87
        })}\n\n"
 
    return StreamingResponse(
        generate(),
        media_type="text/event-stream"
    )

This sends three types of events: status updates, streaming content, and completion with metadata.

Type-Safe State Contracts

Define clear contracts between backend and frontend. TypeScript makes state transitions explicit.

Frontend types (Next.js/TypeScript):

typescript

// types/ai.ts
export type AIState =
  | 'thinking'
  | 'streaming'
  | 'complete'
  | 'error';
 
export interface AIEvent {
  type: 'status' | 'content' | 'complete' | 'error';
  state?: AIState;
  text?: string;
  confidence?: number;
  error?: string;
}

Custom hook to consume events:

typescript

// hooks/useAIStream.ts
import {
  useState,
  useCallback
} from 'react';
 
export function useAIStream() {
  const [state, setState] =
    useState<AIState>('thinking');
  const [content, setContent] = useState('');
  const [confidence, setConfidence] =
    useState<number | null>(null);
 
  const processTicket = useCallback(
    async (ticketData: any) => {
      const response = await fetch(
        '/api/resolve-ticket',
        {
          method: 'POST',
          body: JSON.stringify(ticketData)
        }
      );
 
      const reader = response.body
        ?.getReader();
      const decoder = new TextDecoder();
 
      while (reader) {
        const { done, value } =
          await reader.read();
        if (done) break;
 
        const chunk = decoder.decode(value);
        const lines = chunk
          .split('\n')
          .filter(line => line.startsWith('data: '));
 
        for (const line of lines) {
          const data = JSON.parse(
            line.replace('data: ', '')
          );
 
          if (data.type === 'status') {
            setState(data.state);
          } else if (data.type === 'content') {
            setContent(prev => prev + data.text);
            setState('streaming');
          } else if (data.type === 'complete') {
            setState('complete');
            setConfidence(data.confidence);
          }
        }
      }
    },
    []
  );
 
  return {
    state,
    content,
    confidence,
    processTicket
  };
}

UI Patterns for Each State

Different states need different visual feedback.

Thinking state: Show that processing started. Use subtle animation, not aggressive spinners. This might last 200ms or 5 seconds.

Streaming state: Display content as it arrives. Add a cursor or subtle indicator that more is coming. Users can start reading while the model generates.

Complete state: Show confidence if available. Let users know the AI finished and how certain it is about the output.

Error state: Explain what happened and offer recovery options. Errors will happen—make them clear and actionable.

Component example:

typescript

// components/AIResponse.tsx
'use client';
 
import { useAIStream } from '@/hooks/useAIStream';
 
export function AIResponse({ ticket }) {
  const {
    state,
    content,
    confidence,
    processTicket
  } = useAIStream();
 
  return (
    <div className="ai-response">
      {state === 'thinking' && (
        <div className="thinking">
          <span className="animate-pulse">
            Analyzing ticket...
          </span>
        </div>
      )}
 
      {state === 'streaming' && (
        <div className="streaming">
          {content}
          <span className="cursor">▋</span>
        </div>
      )}
 
      {state === 'complete' && (
        <div className="complete">
          <div>{content}</div>
          {confidence && (
            <div className="confidence">
              Confidence: {
                Math.round(confidence * 100)
              }%
            </div>
          )}
        </div>
      )}
 
      {state === 'error' && (
        <div className="error">
          Something went wrong.
          Try again or contact support.
        </div>
      )}
    </div>
  );
}

Handling Confidence and Uncertainty

LLMs have varying confidence levels. Some responses are near-certain, others are guesses.

Expose confidence scores when your backend can calculate them. Show them to users when confidence crosses important thresholds.

For a helpdesk system:

High confidence (above 85%): Auto-resolve the ticket
Medium confidence (60-85%): Suggest resolution, require human approval
Low confidence (below 60%): Route to human immediately

python

# Backend: Calculate confidence
def calculate_confidence(
    response: str,
    context: dict
) -> float:
    # Your confidence calculation logic
    # Could use model's logprobs,
    # retrieval scores, or heuristics
    return score
 
@app.post("/api/process")
async def process(data: dict):
    response = generate_response(data)
    confidence = calculate_confidence(
        response,
        data
    )
 
    return {
        'response': response,
        'confidence': confidence,
        'action': (
            'auto_resolve' if confidence > 0.85
            else 'needs_approval'
        )
    }

Show this in UI with visual indicators:

typescript

export function ConfidenceIndicator({
  score
}: { score: number }) {
  const getColor = (score: number) => {
    if (score > 0.85) return 'bg-green-500';
    if (score > 0.6) return 'bg-yellow-500';
    return 'bg-red-500';
  };
 
  return (
    <div className="flex items-center gap-2">
      <div className="w-full bg-gray-200
                      rounded h-2">
        <div
          className={`h-2 rounded ${
            getColor(score)
          }`}
          style={{
            width: `${score * 100}%`
          }}
        />
      </div>
      <span className="text-sm">
        {Math.round(score * 100)}%
      </span>
    </div>
  );
}

Graceful Degradation

Failures happen. Network cuts out, model times out, context limits hit. Design for these cases up front.

Backend error handling:

python

@app.post("/api/process")
async def process(data: dict):
    try:
        response = await generate_response(data)
        return {'status': 'success', 'data': response}
    except TimeoutError:
        return {
            'status': 'error',
            'error_type': 'timeout',
            'message': 'Request took too long',
            'fallback': 'retry'
        }
    except Exception as e:
        return {
            'status': 'error',
            'error_type': 'unknown',
            'message': str(e),
            'fallback': 'contact_support'
        }

Frontend error recovery:

typescript

export function ErrorRecovery({
  error
}: { error: AIError }) {
  return (
    <div className="error-card">
      <p className="text-red-600">
        {error.message}
      </p>
 
      {error.fallback === 'retry' && (
        <button onClick={retry}>
          Try Again
        </button>
      )}
 
      {error.fallback === 'contact_support' && (
        <button onClick={contactSupport}>
          Contact Support
        </button>
      )}
    </div>
  );
}

Loading States for Variable Timing

Responses can take 200ms or 5 seconds. Each needs different feedback patterns.

Short responses (under 1s): Simple loading state, no need for detailed progress.

Long responses (1s+): Show what's happening. If you're running multiple steps (categorize → search → generate), show each step.

python

# Multi-step with progress updates
@app.post("/api/auto-resolve")
async def auto_resolve(ticket_id: str):
    async def stream_progress():
        # Step 1
        yield event('step', {
            'name': 'categorizing',
            'status': 'running'
        })
        category = await categorize(ticket_id)
        yield event('step', {
            'name': 'categorizing',
            'status': 'complete'
        })
 
        # Step 2
        yield event('step', {
            'name': 'searching',
            'status': 'running'
        })
        articles = await search_kb(category)
        yield event('step', {
            'name': 'searching',
            'status': 'complete'
        })
 
        # Step 3
        yield event('step', {
            'name': 'generating',
            'status': 'running'
        })
        response = await generate(ticket_id, articles)
        yield event('step', {
            'name': 'generating',
            'status': 'complete'
        })
 
        yield event('complete', {'response': response})
 
    return StreamingResponse(
        stream_progress(),
        media_type="text/event-stream"
    )

Frontend shows each step:

typescript

export function StepProgress({
  steps
}: { steps: Step[] }) {
  return (
    <div className="space-y-2">
      {steps.map((step, i) => (
        <div key={i}
             className="flex items-center gap-2">
          {step.status === 'running' && (
            <Spinner className="w-4 h-4" />
          )}
          {step.status === 'complete' && (
            <Check className="w-4 h-4
                              text-green-500" />
          )}
          <span className="text-sm">
            {step.name}
          </span>
        </div>
      ))}
    </div>
  );
}

References

Anthropic Streaming Documentation - Server-Sent Events with Claude
Vercel AI SDK - Streaming AI responses in Next.js
Server-Sent Events Specification - W3C standard for streaming
React useEffect for Streaming - Managing cleanup in streaming hooks