Managing State for Streaming AI Responses
LLM responses arrive as chunks, not all at once. Handle loading, streaming, completion, and errors without breaking the user experience.
LLM responses arrive as chunks, not all at once. Handle loading, streaming, completion, and errors without breaking the user experience.
AI systems stream responses with variable length and timing. Here's how to design interfaces that show progress immediately and handle uncertainty gracefully.
Structure prompts to maximize Anthropic's prompt caching, reducing costs by 90% and latency by 85% for repeated context.
How to test LLM outputs with code-based grading, human evaluation, and LLM-as-judge. When to use each method and why statistical rigor matters.
Error messages consume context and affect LLM decision-making. Structure errors as data, use reference IDs for details, and return actionable recovery paths.
Resources represent data or files that an MCP client can read. A case study of the SQLite MCP server shows how resources and tools work together.
How to design tool responses that preserve context space for what matters. Filter early, return minimal data, and structure outputs for LLM consumption.
AI agents work better when they see the full structure upfront, then make targeted requests. How to use progressive disclosure for efficient context management.
How to recognize when your conversation has grown too large to be effective, and what to do about it.
Understanding what fills an LLM's context window and how it affects model behavior.
Definition and explanation of tokens in large language models.
MCP provides a standardized way for AIs to interact with tools, from Figma to your calendar to custom workflows you build yourself.
Five prompting techniques that improve LLM outputs: few-shot learning, chain-of-thought reasoning, XML structure, output constraints, and prompt chaining.
Your prompt's opening sets the context for the entire response.
LLMs generate text one token at a time. Understanding how they convert text to vectors, use attention to weigh context, and predict probabilities explains their behavior.
When models fail or behave unexpectedly, you need to understand why. Practical debugging techniques for tokenization, attention patterns, and context limits.
The architectural pattern that makes Agent Skills scalable: load only what's needed, when it's needed.
Anthropic's Agent Skills let you equip Claude with specialized capabilities through reusable skill packages. Here's how to build them.