A two-hidden-layer network fits a noisy sinusoid from scratch using Adam. Watch the predicted curve evolve from a flat line to the underlying function as training steps accumulate.
Blue dots are training data. The orange line is the network's current prediction.
Mean squared error over training steps. Each step processes a random mini-batch of 20 samples.