A fully-connected network "paints" a target image by learning to map pixel coordinates (x, y) → RGB color. Low-frequency blobs appear first; fine detail fills in as training continues.
Left: target. Right: current network output. Both update every frame.
Mean squared error between the target and the network's pixel predictions.