The resident will identify failure modes in frontier models and develop rigorous benchmarks for long-horizon AI agents. They will also train autonomous agents capable of reasoning, planning, and acting over extended time horizons.
Polymath
2 Remote Job Openings at Polymath
Build simulation environments, tasks, and verifiers to challenge and measure frontier AI models. Collaborate with the research team to identify failure modes and develop high-fidelity training grounds for autonomous agents.