TL;DR

Temperature and sampling parameters control randomness in AI outputs. Lower temperature = more predictable; higher = more creative. Tune these to balance creativity and consistency.

What is temperature?

Definition:
A parameter (0-2) that controls randomness in text generation.

Low temperature (0-0.3):

  • Predictable, consistent
  • Picks most likely words
  • Good for: Factual answers, code, translations

Medium temperature (0.7-1.0):

  • Balanced creativity
  • Good for: Writing, brainstorming, chat

High temperature (1.0-2.0):

  • Very creative, unpredictable
  • Can be incoherent
  • Good for: Poetry, creative fiction, wild ideas

How temperature works

  1. Model predicts probability for each possible next word
  2. Temperature adjusts these probabilities
  3. Higher temp = flatter distribution (more randomness)
  4. Lower temp = sharper distribution (more deterministic)

Other sampling parameters

Top-p (nucleus sampling):

  • Limits choices to top X% probability mass
  • 0.9 = consider top 90% most likely words
  • Alternative to temperature

Top-k:

  • Limits to top K most likely words
  • K=40 = choose from 40 best options

Frequency penalty:

  • Reduces repetition
  • Higher = less likely to repeat words

Presence penalty:

  • Encourages new topics
  • Higher = more diversity

When to use each setting

Factual tasks (temp 0-0.3):

  • Data extraction
  • Translations
  • Code generation
  • Structured outputs

Creative tasks (temp 0.7-1.2):

  • Writing stories
  • Brainstorming
  • Marketing copy

Exploration (temp 1.5+):

  • Generating many diverse options
  • Experimental creative writing

Combining parameters

Deterministic + focused:

  • Temperature: 0
  • Top-p: 0.1
  • Result: Very consistent outputs

Creative + coherent:

  • Temperature: 0.9
  • Top-p: 0.9
  • Frequency penalty: 0.5
  • Result: Creative but readable

Common mistakes

  • Using high temp for code (breaks syntax)
  • Using temp 0 for creative writing (boring)
  • Not testing different settings
  • Assuming default is always best

Best practices

  1. Start with defaults
  2. Adjust based on output quality
  3. Test systematically
  4. Document what works for each use case

What's next

  • Prompt Engineering
  • Model Selection
  • Output Quality Optimization