1D demo: GD vs momentum on f(x)=x⁴ - 0.5x² + 0.1x

This compares plain gradient descent and heavy-ball momentum on the tilted double-well function f(x)=x⁴-0.5x²+0.1x, starting from x₀=1. The derivative is f'(x)=4x³-x+0.1. For small step sizes, GD typically settles in the shallow right local minimum, while momentum can overshoot the barrier and reach the deeper left minimum.
GD    Momentum    current iterate
x₀ = 1 (fixed)
What to try
Start with η≈0.02 and γ≈0.95. You should often see GD settle into the right local minimum near x≈0.44, while momentum crosses the barrier near x≈0.10 and reaches the deeper left minimum near x≈-0.54.

Lower γ to make momentum behave more like GD. Increase η too much and both methods can oscillate or diverge.