Agent Quality / Evals Engineer 1754

 Posted 6 hours ago
     
2-5 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

The role involves building and maintaining an MVP evaluation harness, including golden tasks and scorecard metrics. It also requires integrating evaluations into the CI pipeline to ensure quality regressions fail builds and releases.

This is a remote position.

Owns the eval harness and quality gate from the beginning. This role replaces the old late-stage “Evals Specialist” model with a standing owner for measurable agent quality.

Key Responsibilities

• Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.
• Wire evals into CI so quality regressions fail builds and releases.
• Define and maintain release-gate thresholds with Product and the Tech Lead.
• Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope.


Requirements

Must-Have Qualifications

• Experience evaluating ML, LLM, or non-deterministic systems.
• Strong test and benchmark design capability.
• Comfort working with noisy metrics, thresholds, and probabilistic behavior.
• Good scripting and automation skills.

AI-First Expectations

• Uses AI to generate candidate eval cases and failure hypotheses, but never confuses generated tests with validated quality.
• Approaches AI quality as an operating system, not a QA afterthought.

What Success Looks Like in the First 90 Days

• The first reference agent has a published scorecard and gated eval path. • Golden and exception tests run automatically. • The team can explain what “good enough to ship” means in measurable terms.

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified