How Prompt Priming Shapes LLM Responses

January 23, 2025•3 min read

What Priming Does

Every token gets converted to a vector (a list of numbers representing meaning). The model uses these vectors to determine what comes next. When you start with "You are an expert Python developer," those tokens create vectors that cluster near Python concepts, best practices, and technical terminology.

The model's attention mechanism weights these priming vectors when generating each new token. This shifts the probability distribution—making Python-related tokens more likely, Java-related tokens less likely.

Real-World Example: AI Coding Tools

Claude Code and Cursor feel different from ChatGPT because they use different priming. Every conversation starts with a system prompt you never see.

Claude Code system prompt (simplified):

markdown

You are Claude Code, Anthropic's official CLI for Claude.
You are an interactive CLI tool that helps users with software engineering tasks.
Your responses should be short and concise.
Use specialized tools instead of bash commands when possible.

This priming makes Claude Code:

Use technical, engineer-to-engineer tone
Prefer file operation tools over bash cat/grep commands
Keep responses concise rather than conversational
Focus on code rather than general chat

Cursor uses different priming:

markdown

You are an AI coding assistant integrated into a code editor.
You can see the user's current file and cursor position.
Suggest code completions and edits contextually.

This priming makes Cursor:

Provide inline suggestions rather than full explanations
Reference the current file context
Format responses as code edits

Same underlying model architecture, different behavior—because the priming vectors position each conversation in different regions of the embedding space.

When to Use Priming

Setting technical depth for an entire conversation ("expert" vs "beginner")
Enforcing consistent code style across multiple requests ("concise" vs "verbose with comments")
Establishing domain-specific terminology (medical, legal, financial contexts)
Controlling response format preferences (bullet points, paragraphs, code-only)
Building AI tools with specific personalities (like Claude Code's concise engineer tone)

Why This Matters

Priming works because:

Vector similarity determines token selection
Early tokens influence attention weights for all subsequent tokens
The model's embedding space clusters related concepts together
Your priming establishes which cluster to sample from

Use priming to set:

Technical depth ("expert" vs "beginner")
Response style ("concise" vs "detailed explanations")
Output format ("code only" vs "code with commentary")
Domain context ("API documentation" vs "learning tutorial")

The technique works across all transformer-based models. System messages in APIs are essentially pre-applied priming that persists across the conversation.

References

Efficient Estimation of Word Representations in Vector Space - Mikolov et al., Google (Word2Vec paper with famous king-queen vector arithmetic)
The Illustrated Word2vec - Jay Alammar's visual guide to word embeddings and vector space
TensorFlow Embedding Projector - Interactive 3D visualization of word embeddings
Attention Is All You Need - Vaswani et al., Google Research
How LLMs Think and Respond - Token generation and vectors
Prompting Techniques That Actually Work - Practical prompting strategies