Intermediate11 min read

Prompting 201: Structured Prompts & JSON Output

Advanced prompting: structured formats, JSON output, few-shot learning, chain-of-thought, and prompt templates for production.

promptingadvancedstructured-outputJSONtemplates

TL;DR

Advanced prompting makes AI outputs reliable, structured, and production-ready. Use structured prompts with clear sections, request JSON for machine-readable responses, provide examples (few-shot learning) to guide behavior, and ask AI to think step-by-step (chain-of-thought) for complex problems. Build reusable templates, test thoroughly, and version your prompts like code.

Why it matters

Basic prompting works for casual use, but production systems need consistency. When you're building an API, automating workflows, or integrating AI into products, you can't rely on unpredictable free-form responses. Advanced techniques give you the control and reliability you need.

Structured prompts: Format, sections, and clarity

Structure turns messy prompts into reliable instructions. Break your prompt into clear sections:

Basic structure:

[ROLE] You are an expert technical writer.

[CONTEXT] I'm documenting a REST API for developers.

[TASK] Write clear, concise descriptions for each endpoint.

[CONSTRAINTS]
- Use active voice
- Maximum 2 sentences per description
- Include HTTP method and path

[EXAMPLE]
GET /users/{id}
Retrieves a single user by their unique identifier. Returns 404 if user not found.

[INPUT]
POST /orders - Creates a new order

This format makes it easy to modify individual sections without rewriting everything. You can swap roles, add constraints, or update examples independently.

Why it works:

Clear boundaries prevent confusion
Easy to debug when outputs are wrong
Reusable across similar tasks
Forces you to think through requirements

Requesting JSON output

When integrating AI into code, you need machine-readable responses. JSON is perfect for this.

Pattern 1: Direct request

Analyze this customer review and return JSON with sentiment and key themes.

Review: "The product arrived quickly but the quality was disappointing. Customer service was helpful when I contacted them about a refund."

Format:
{
  "sentiment": "positive|negative|mixed",
  "sentiment_score": 0.0 to 1.0,
  "themes": ["shipping", "quality", "support"],
  "actionable": true|false
}

Pattern 2: Schema specification

For complex outputs, provide a detailed schema:

Extract key information from this invoice and return valid JSON matching this schema:

{
  "invoice_number": "string",
  "date": "YYYY-MM-DD",
  "vendor": {
    "name": "string",
    "address": "string"
  },
  "line_items": [
    {
      "description": "string",
      "quantity": number,
      "unit_price": number,
      "total": number
    }
  ],
  "subtotal": number,
  "tax": number,
  "total": number
}

Invoice text: [your invoice content]

Validation tips:

Always parse the JSON in your code to catch errors
Handle cases where AI returns malformed JSON
Use try-catch blocks and fallbacks
Consider asking for responses wrapped in markdown code blocks: "Return JSON in a ```json code block"

Few-shot learning: Using examples to guide behavior

Examples are the most powerful way to shape AI behavior. Show what you want, and the AI will match the pattern.

Zero-shot (no examples):

Categorize this support ticket: "I can't log in to my account"

Few-shot (with examples):

Categorize support tickets into: auth, billing, technical, or general.

Examples:
"I forgot my password" → auth
"Why was I charged twice?" → billing
"The app crashes on iOS" → technical
"When are you launching new features?" → general

Categorize: "I can't log in to my account"

How many examples?

1-2 examples: Simple tasks, clear patterns
3-5 examples: Most use cases, good balance
5-10 examples: Complex tasks, nuanced distinctions
10+ examples: Rare; consider fine-tuning instead

Example selection matters:

Cover edge cases and tricky inputs
Show variation in formatting and wording
Include both obvious and subtle distinctions
Arrange from simple to complex

Practical template:

Task: [What you want done]

Examples:
Input 1 → Output 1
Input 2 → Output 2
Input 3 → Output 3

Now you try:
Input: [Your actual input]

Chain-of-thought prompting for complex reasoning

For problems requiring multiple steps, asking AI to "show its work" dramatically improves accuracy.

Without chain-of-thought:

A store sells apples for $2 each. If you buy 10, you get 20% off. How much do 15 apples cost?

With chain-of-thought:

A store sells apples for $2 each. If you buy 10, you get 20% off. How much do 15 apples cost?

Think step by step:
1. Calculate the base price
2. Determine if a discount applies
3. Calculate the discounted price
4. Show your work

Advanced pattern: Show reasoning in JSON

Analyze this code for security issues. Return JSON with your reasoning.

Code:
def login(username, password):
    query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
    return db.execute(query)

Format:
{
  "issues": [
    {
      "type": "SQL injection",
      "severity": "critical",
      "line": "query = f\"SELECT...\"",
      "reasoning": "User input is directly interpolated into SQL without sanitization",
      "fix": "Use parameterized queries: cursor.execute('SELECT * FROM users WHERE username=? AND password=?', (username, password))"
    }
  ]
}

When to use chain-of-thought:

Math problems
Logic puzzles
Code debugging
Multi-step analysis
Decision-making with trade-offs

Building reusable prompt templates

Templates turn one-off prompts into production assets.

Basic template with variables:

TEMPLATE = """
You are a {role}.

Task: {task}

Context: {context}

Requirements:
{requirements}

Input:
{input_data}
"""

# Usage
prompt = TEMPLATE.format(
    role="customer support agent",
    task="Draft a professional response to this complaint",
    context="Customer has been waiting 3 weeks for a refund",
    requirements="- Empathetic tone\n- Offer specific next steps\n- Under 100 words",
    input_data=customer_message
)

Advanced: Conditional sections

def build_prompt(task, examples=None, output_format=None):
    prompt = f"Task: {task}\n\n"

    if examples:
        prompt += "Examples:\n"
        for ex in examples:
            prompt += f"- {ex['input']} → {ex['output']}\n"
        prompt += "\n"

    if output_format == "json":
        prompt += "Return response as valid JSON.\n\n"
    elif output_format == "markdown":
        prompt += "Return response in markdown format.\n\n"

    prompt += "Input: {input}"
    return prompt

Template best practices:

Use clear variable names
Document expected formats
Include defaults for optional sections
Version your templates
Test with edge cases

Handling edge cases and error-prone inputs

Production systems need to handle the unexpected.

Common edge cases:

Empty inputs
Extremely long inputs
Malformed data
Unexpected languages
Special characters and encoding issues
Ambiguous requests

Defensive prompting:

Extract email addresses from this text.

Rules:
- If no emails found, return empty array: []
- Validate email format (must contain @ and domain)
- Remove duplicates
- Handle multiple emails separated by commas, spaces, or newlines

Return JSON:
{
  "emails": ["list", "of", "emails"],
  "count": number
}

Text: {input_text}

Input validation in code:

def safe_prompt(user_input, max_length=5000):
    # Validate input
    if not user_input or not user_input.strip():
        return {"error": "Empty input"}

    # Truncate if too long
    if len(user_input) > max_length:
        user_input = user_input[:max_length] + "..."

    # Escape special characters if needed
    user_input = user_input.replace('"', '\\"')

    # Build prompt
    prompt = f'Analyze: "{user_input}"'
    return prompt

Testing and versioning prompts

Treat prompts like code: test, version, and iterate.

Testing approach:

Create a test suite of diverse inputs
Define success criteria (accuracy, format, tone)
Run tests against each prompt version
Track metrics (success rate, error types)
Iterate based on failures

Example test:

test_cases = [
    {
        "input": "I love this product!",
        "expected": {"sentiment": "positive"}
    },
    {
        "input": "It's okay, nothing special.",
        "expected": {"sentiment": "neutral"}
    },
    {
        "input": "",
        "expected": {"error": "empty_input"}
    }
]

for test in test_cases:
    result = run_prompt(SENTIMENT_PROMPT, test["input"])
    assert result == test["expected"], f"Failed on: {test['input']}"

Versioning:

PROMPTS = {
    "sentiment_v1": "Analyze sentiment: {input}",
    "sentiment_v2": "Analyze sentiment (positive/negative/neutral): {input}",
    "sentiment_v3": """Analyze sentiment and return JSON:
    {{"sentiment": "positive|negative|neutral", "confidence": 0.0-1.0}}
    Input: {input}"""
}

# Use specific versions in production
current_version = "sentiment_v3"
prompt = PROMPTS[current_version].format(input=user_text)

Tools and libraries for prompt management

As your prompts grow, consider dedicated tools:

LangChain: Framework for building LLM applications with reusable prompt templates

Promptfoo: Testing and evaluation tool for prompts

Weights & Biases: Track prompt performance and experiments

Version control: Store prompts in Git, track changes, review like code

Simple approach: Start with a prompts directory in your repo:

/prompts
  /sentiment
    v1.txt
    v2.txt
    v3.txt
  /summarization
    v1.txt
  /extraction
    v1.txt

What's next?

Evaluating AI Answers: Learn systematic approaches to measuring AI output quality
RAG (Retrieval-Augmented Generation): Combine AI with your own data sources
Fine-tuning Basics: When to train custom models vs. using prompts
AI Safety & Guardrails: Preventing harmful outputs in production systems