TL;DR

Hyperparameters are settings that control model training but aren't learned from data. Tuning them well can significantly improve model performance. Start with established defaults, then systematically search for better values using random search or Bayesian optimization.

Why it matters

The same model architecture can perform vastly differently depending on hyperparameters. Good tuning can improve performance by 10-50% without changing anything else. It's often the difference between a model that works and one that doesn't.

What are hyperparameters?

Parameters vs hyperparameters

Parameters: Learned during training (weights, biases)
Hyperparameters: Set before training (learning rate, batch size)

Common hyperparameters

Hyperparameter What it controls Typical range
Learning rate How fast model learns 0.0001 - 0.1
Batch size Examples per update 16 - 512
Epochs Training iterations 3 - 100
Dropout Regularization strength 0.1 - 0.5
Hidden layers Model complexity 1 - 10

Tuning approaches

Manual tuning

Adjust based on intuition and experience:

Process:

  1. Start with defaults
  2. Train and evaluate
  3. Adjust based on results
  4. Repeat

Best for: Quick experiments, learning intuition

Try all combinations in a predefined grid:

Example:

Learning rate: [0.01, 0.001, 0.0001]
Batch size: [32, 64, 128]
= 9 combinations to try

Pros: Thorough, reproducible
Cons: Exponentially expensive, misses values between grid points

Sample random combinations:

Process:

  1. Define parameter ranges
  2. Sample random combinations
  3. Train and evaluate each
  4. Select best

Pros: More efficient than grid, finds good values faster
Cons: May miss optimal, requires enough samples

Bayesian optimization

Use past results to guide search:

Process:

  1. Try initial random points
  2. Build model of parameter-performance relationship
  3. Select next point that maximizes expected improvement
  4. Update model with new result
  5. Repeat

Pros: Efficient, especially for expensive evaluations
Cons: More complex to implement

Best practices

Start with defaults

Don't tune blindly:

  • Use published defaults
  • Research what works for similar tasks
  • Often defaults are already good

Prioritize impactful parameters

Not all hyperparameters matter equally:

  • Learning rate often most important
  • Architecture choices second
  • Minor parameters last

Use validation set

Never tune on test data:

  • Separate validation for tuning
  • Test only for final evaluation
  • Avoid overfitting to validation

Log everything

Track all experiments:

  • Parameter values
  • Performance metrics
  • Training curves
  • Random seeds

Common mistakes

Mistake Problem Prevention
Tuning on test set Overfitting to test Separate validation set
Grid search only Inefficient, misses values Use random or Bayesian
Tuning too early Wasted effort Get baseline working first
Ignoring defaults Reinventing the wheel Start from established settings
Too many parameters Combinatorial explosion Prioritize key parameters

What's next

Continue optimizing AI: