A deeper two-conv-layer network trains on synthetic 12×12 RGB patches. More filters and a second conv stage capture richer spatial structure than a single-layer CNN.

Predictions

Each tile shows a color patch with its true class (top) and predicted class (bottom). Green border = correct.

Training accuracy

Network: Conv(3×3,8)→ReLU→Conv(3×3,8)→ReLU→MaxPool→Dense(32)→Dense(6).