Inverted Pendulum Trainer

This standalone trainer implements a lightweight cross-entropy method (CEM) agent that learns to balance a cart-pole from scratch using only browser-side JavaScript—no server round-trips or external dependencies. Hit Start training to watch the policy evolve and the pendulum animation stabilize as the agent improves.

Visualization

The canvas shows the best policy found so far. When training starts, the animation will snap upright as soon as the agent discovers a stabilizing behavior.

Learning curve

Each iteration keeps the top 20% of sampled policies, updates the Gaussian search distribution, and tracks the best survival time.