Temperature and Sampling: Controlling AI Creativity
By Marcin Piekarski builtweb.com.au · Last Updated: 11 February 2026
TL;DR: Temperature, top-p, and other sampling parameters control how creative or deterministic AI outputs are. Learn how to tune them.
TL;DR
Temperature and sampling parameters control how random or predictable AI outputs are. A low temperature (close to 0) makes the model pick the most likely words every time, giving you consistent, factual responses. A high temperature (above 1.0) introduces randomness, producing more creative and varied outputs. Learning to tune these settings is one of the easiest ways to get significantly better results from any AI model.
Why it matters
Most people use AI with default settings and never touch temperature or sampling parameters. This is like driving a car stuck in one gear. It works, but you are not getting the best performance for different situations.
When you need a factual answer, a code snippet, or a data extraction, you want the AI to be deterministic and pick the single best response. When you are brainstorming ideas, writing marketing copy, or generating creative content, you want variety and surprise. The same model can do both, but only if you adjust these parameters.
Understanding temperature also helps you diagnose problems. If your AI keeps giving repetitive, boring answers, the temperature might be too low. If it is producing incoherent nonsense, the temperature might be too high. Knowing what these knobs do gives you the power to fix these issues instead of blaming the model.
What is temperature?
Temperature is a number, typically between 0 and 2, that controls how random the AI's word choices are. The name comes from thermodynamics in physics, where higher temperature means more energetic and chaotic particle movement.
At temperature 0, the model always picks the single most probable next word. Ask it the same question ten times and you will get nearly identical answers every time. This is great for tasks where there is one right answer: extracting data from a document, translating a sentence, or generating structured JSON.
At temperature 0.7 to 1.0, the model introduces some controlled randomness. It still favors likely words but occasionally picks less obvious ones, leading to more natural-sounding and varied text. This is the sweet spot for most conversational and writing tasks.
At temperature 1.5 and above, the model becomes highly unpredictable. It frequently picks unlikely words, which can produce surprising creative ideas but also incoherent gibberish. Use this range only when you specifically want wild, experimental output and plan to filter the results heavily.
How temperature works under the hood
To understand temperature mechanically, you need to know what happens when an AI model generates text. At each step, the model calculates a probability for every possible next word (or more precisely, every possible next token). "The cat sat on the ___" might produce probabilities like: "mat" (40%), "floor" (25%), "couch" (15%), "roof" (5%), "moon" (0.1%).
Temperature adjusts these probabilities before the model picks a word. Low temperature makes the high-probability options even more dominant and the low-probability options almost invisible. The distribution becomes "sharp" or "peaked," and the model almost always picks "mat."
High temperature flattens the distribution, giving low-probability options a better chance. Now "roof" and even "moon" have a real shot at being selected. This is how you get creative and unexpected outputs, but also how you get nonsensical ones.
At temperature 0, the model skips the randomness entirely and always picks the highest-probability option. This is called "greedy decoding" and produces the most predictable possible output.
Other sampling parameters
Temperature is not the only control you have. Several other parameters shape how the model selects its next word.
Top-p (nucleus sampling) limits the model's choices to the smallest set of words whose combined probability exceeds a threshold. With top-p set to 0.9, the model considers only the most likely words that together account for 90% of the probability mass. This automatically adjusts how many options are available. When the model is very confident, it might consider just two or three words. When it is uncertain, it might consider dozens.
Top-k is simpler. It limits the model to the top K most likely words regardless of their probabilities. With top-k set to 40, the model always chooses from exactly 40 options. This is less adaptive than top-p but easier to reason about.
Frequency penalty reduces the probability of words that have already appeared in the output. Higher values make the model less likely to repeat itself. This is useful for preventing the AI from getting stuck in loops where it repeats the same phrase.
Presence penalty is similar but binary. Instead of penalizing words more for each repetition, it applies a flat penalty to any word that has appeared at all. This encourages the model to introduce new topics and vocabulary rather than circling back to the same concepts.
Practical settings for common tasks
For factual extraction and data tasks, use temperature 0 to 0.2 with top-p around 0.1. You want the single most accurate response every time. If you are extracting dates from a document, there is no benefit to creativity. Consistency and correctness are all that matter.
For code generation, keep temperature between 0 and 0.3. Code has strict syntax rules, and higher temperatures introduce errors. A missing bracket or a hallucinated function name is worse than slightly boring code. If you want alternative approaches to a coding problem, it is better to ask explicitly than to crank up the temperature.
For general writing and conversation, temperature 0.7 to 1.0 is the sweet spot. This gives you natural-sounding text with enough variation to feel human. Add a frequency penalty of 0.3 to 0.5 to prevent repetitive phrasing.
For brainstorming and creative exploration, push temperature to 1.0 to 1.3 with a presence penalty of 0.5 to 1.0. You want the model to surprise you with unexpected connections and ideas. Generate multiple responses and pick the best ones rather than expecting every output to be usable.
For experimental creative writing like poetry or surrealist fiction, you can try temperature 1.5 and above. Expect a high rate of unusable output. The gems you find will be genuinely creative, but you will need to sift through a lot of noise.
Combining parameters effectively
These parameters interact with each other, and it helps to think of them as a system rather than individual controls.
For maximum consistency, set temperature to 0, top-p to 0.1, and leave penalties at 0. This gives you virtually identical responses to the same prompt every time.
For creative but coherent output, try temperature 0.9, top-p 0.9, frequency penalty 0.5, and presence penalty 0.3. The temperature and top-p add variety, while the penalties prevent the model from repeating itself or getting stuck on one topic.
A general rule: do not set both temperature and top-p to extreme values simultaneously. If temperature is already very low, an aggressive top-p is redundant. If temperature is very high, a restrictive top-p can create odd behavior where the model has lots of randomness but very few options to choose from.
Common mistakes
The most common mistake is using the same settings for every task. Temperature 0.7 might be the default, but it is not optimal for code generation, data extraction, or creative fiction. Take thirty seconds to adjust settings for the task at hand.
Another mistake is using high temperature for code. Code has rigid syntax rules, and even a small amount of randomness can introduce bugs. A temperature of 1.0 might produce a function call with the wrong number of arguments or a variable name that does not exist.
People also set temperature to 0 and then complain that the AI is "boring" or "repetitive." At temperature 0, the model will produce nearly the same response every time. If you want variety, you need to either raise the temperature or change your prompt.
Finally, many users never test different settings. They try one configuration, decide it is "good enough," and move on. Spending ten minutes testing three or four temperature values on your actual prompts can dramatically improve your results.
What's next?
- Master the art of crafting effective inputs in Prompt Engineering Basics
- Understand the tokens that temperature operates on in Token Economics
- Learn how models generate text in AI Model Architectures
- Explore advanced prompt strategies in Context Management
Frequently Asked Questions
What temperature should I use for ChatGPT or Claude?
For most everyday tasks like writing emails, answering questions, and summarizing text, the default temperature (usually around 0.7 to 1.0) works well. For code generation or data extraction, lower it to 0 to 0.2. For creative writing or brainstorming, try 0.9 to 1.2. The best approach is to experiment with a few values on your specific task.
Should I use temperature or top-p?
Most experts recommend adjusting one or the other, not both at the same time. Top-p is more adaptive because it automatically adjusts the number of options based on the model's confidence. Temperature is more intuitive and widely understood. Either works well. Pick one and learn to tune it.
Why does temperature 0 sometimes give slightly different answers?
Even at temperature 0, some APIs introduce tiny amounts of randomness due to floating-point arithmetic or batching optimizations. The differences are usually minor, like word order or punctuation. If you need truly identical responses, some providers offer a seed parameter for exact reproducibility.
Can I change temperature mid-conversation?
In most API implementations, yes. You set temperature per request, so you can use temperature 0 for a factual question and then switch to 0.9 for a creative follow-up. In chat interfaces like ChatGPT, you typically set temperature once for the whole conversation, but some interfaces allow per-message adjustments.
Was this guide helpful?
Your feedback helps us improve our guides
About the Authors
Marcin Piekarski· Frontend Lead & AI Educator
Marcin is a Frontend Lead with 20+ years in tech. Currently building headless ecommerce at Harvey Norman (Next.js, Node.js, GraphQL). He created Field Guide to AI to help others understand AI tools practically—without the jargon.
Credentials & Experience:
- 20+ years web development experience
- Frontend Lead at Harvey Norman (10 years)
- Worked with: Gumtree, CommBank, Woolworths, Optus, M&C Saatchi
- Runs AI workshops for teams
- Founder of builtweb.com.au
- Daily AI tools user: ChatGPT, Claude, Gemini, AI coding assistants
- Specializes in React ecosystem: React, Next.js, Node.js
Areas of Expertise:
Prism AI· AI Research & Writing Assistant
Prism AI is the AI ghostwriter behind Field Guide to AI—a collaborative ensemble of frontier models (Claude, ChatGPT, Gemini, and others) that assist with research, drafting, and content synthesis. Like light through a prism, human expertise is refracted through multiple AI perspectives to create clear, comprehensive guides. All AI-generated content is reviewed, fact-checked, and refined by Marcin before publication.
Transparency Note: All AI-assisted content is thoroughly reviewed, fact-checked, and refined by Marcin Piekarski before publication.
Key Terms Used in This Guide
Parameters
The internal numerical values within an AI model that are adjusted during training to capture patterns in data. More parameters generally mean a more capable model, but also higher costs and slower inference.
Temperature
A setting that controls how creative or random an AI's responses are. Low temperature produces predictable, focused answers. High temperature produces varied, more creative outputs.
AI (Artificial Intelligence)
Making machines perform tasks that typically require human intelligence—like understanding language, recognizing patterns, or making decisions.
Fine-Tuning
Taking a pre-trained AI model and training it further on your specific data to make it better at your particular task or adopt a specific style.
Top-p (Nucleus Sampling)
A parameter that controls randomness in AI text generation by choosing from the smallest set of words whose combined probability reaches a threshold p. Lower values make output more focused; higher values make it more creative.
Related Guides
AI Evaluation Metrics: Measuring Model Quality
IntermediateHow do you know if your AI is good? Learn key metrics for evaluating classification, generation, and other AI tasks.
6 min readAI Workflows and Pipelines: Orchestrating Complex Tasks
IntermediateChain multiple AI steps together into workflows. Learn orchestration patterns, error handling, and tools for building AI pipelines.
7 min readFine-Tuning Fundamentals: Customizing AI Models
IntermediateFine-tuning adapts pre-trained models to your specific use case. Learn when to fine-tune, how it works, and alternatives.
8 min read