This standalone trainer implements a lightweight cross-entropy method (CEM) agent that learns to balance a cart-pole from scratch using only browser-side JavaScript—no server round-trips or external dependencies. Hit Start training to watch the policy evolve and the pendulum animation stabilize as the agent improves.
The canvas shows the best policy found so far. When training starts, the animation will snap upright as soon as the agent discovers a stabilizing behavior.
Each iteration keeps the top 20% of sampled policies, updates the Gaussian search distribution, and tracks the best survival time.