Quadratic ravine demo: GD, momentum, Nesterov, Newton, and noisy SGD
Click on the contour plot to set a new start point. The new SGD option performs
x_{t+1}=x_t-\eta(\nabla f(x_t)+\sigma \xi_t) where \(\xi_t \sim \mathcal N(0, I)\) independently each step.
trajectory
current point
optimum
Suggested settings
GD: η ≈ 1/κ (small and stable).
Momentum/Nesterov: try η slightly larger and γ around 0.8–0.95.
Noisy SGD: start with σ ≈ 0.2–0.5 to see visible jitter while still drifting toward the minimizer.