A deeper two-conv-layer network trains on synthetic 12×12 RGB patches. More filters and a second conv stage capture richer spatial structure than a single-layer CNN.
Each tile shows a color patch with its true class (top) and predicted class (bottom). Green border = correct.
Network: Conv(3×3,8)→ReLU→Conv(3×3,8)→ReLU→MaxPool→Dense(32)→Dense(6).