A two-hidden-layer network fits a noisy sinusoid from scratch using Adam. Watch the predicted curve evolve from a flat line to the underlying function as training steps accumulate.

Curve fitting

Blue dots are training data. The orange line is the network's current prediction.

Training loss

Mean squared error over training steps. Each step processes a random mini-batch of 20 samples.