SD Solutions

SafetyTech Client #1 | Adversarial Task Writer for AI Security RL Gyms

Posted 25 days ago

Serbia

⭐ 2-5 years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Design and implement adversarial prompt injection scenarios in YAML to test the safety policies of frontier AI models. Validate success rates and submit high-quality tasks to the RL Gym platform to improve AI robustness.

On behalf of SafetyTech Client #1, SD Solutions is looking for a talented [Position name]

SD Solutions is a staffing company operating globally. Contact us to get more details about the benefits we offer.

Responsibilities:

You design prompt injection scenarios in YAML, run them against frontier models, validate success rates, and submit passing tasks. 5 high-quality tasks per week (full-time equivalent). Per-task compensation, paid on acceptance.

Requirements:

Adversarial mindset: you think like an attacker and understand how to exploit an AI agent’s helpfulness, authority assumptions, or trust in its environment
Prompt injection expertise: direct (role-play, encoding, context flooding) and indirect/environment-embedded (poisoned tool responses, malicious content in documents, cross-context leakage)
Technical writing in YAML
Comfortable with Docker, CLI tools, and running systematic tests against multiple models
Domain realism in at least one vertical: e-commerce, finance, HR, enterprise SaaS, healthcare, travel
Background in pentesting, appsec, LLM security research, or red teaming strongly preferred

The Task

You build adversarial prompt injection tasks for Alice’s RL Gym platform. Each task is a self-contained YAML scenario simulating a realistic AI agent deployment, testing whether the agent can be manipulated into violating its safety policies.

What a task includes:

1. An agent persona with a system prompt, security policies, and a defined set of tools

2. A simulated environment (e-commerce site, messaging platform, enterprise app) with content the agent interacts with

3. An adversarial payload embedded in the environment (in messages, product listings, tool responses, documents) that attempts to trick the agent into a policy violation

4. A benign user request that naturally leads the agent to encounter the payload

5. Deterministic evaluation criteria specifying exactly what constitutes success (agent resists) vs. failure (agent is compromised)

6. Quality gate: each task must cause a policy violation in at least 7/10 runs against at least 2 of 3 SOTA models. Attacks must be diverse (varied technique, surface, domain) and realistic inproduction agent deployments. No contrived setups or tools that exist only to enable the attack.

About the company:

A company building specialized evaluation infrastructure for AI safety and robustness testing. Their platform simulates adversarial conditions used by AI development teams to validate agent behavior before deployment. Currently expanding a freelance contributor pool for scenario and environment development.

By applying for this position, you agree to the terms outlined in our Privacy Policy. Please take a moment to review our Privacy Policy https://sd-solutions.breezy.hr/privacy-notice, and make sure you understand its contents. If you have any questions or concerns regarding our Privacy Policy, please feel free to contact us.

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

SD Solutions

SafetyTech Client #1 | Adversarial Task Writer for AI Security RL Gyms

AI Summary

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

Director, Optimization and Analytics

Founding Power Systems Engineer

Founding Civil Engineer

Founding Electrical Engineer

Founding Mechanical Engineer

CRM WFH

SD Solutions

SafetyTech Client #1 | Adversarial Task Writer for AI Security RL Gyms

AI Summary

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

Director, Optimization and Analytics

Founding Power Systems Engineer

Founding Civil Engineer

Founding Electrical Engineer

Founding Mechanical Engineer

CRM WFH

Personalize your Remote Job Search in 3 Easy Steps!