Design and implement an autonomous learning system and RL loop to improve trading agent strategies based on live outcomes. Own the model and inference infrastructure, including the build-vs-buy decisions for model hosting and optimization.
Staff AI Engineer | AI/Crypto Fintech | Seed Stage Confidential Client | Backed by Tier-1 Crypto VCs
Location: Americas preferred (US > Canada/LatAm) | Remote Compensation: $175,000–$250,000 USD base + 1% equity + team bonuses + pro-rata 2026 token launch participation Stage: Seed | ~20 employees
About the Company
A well-funded seed-stage startup building the next generation of autonomous trading technology. Backed by leading crypto-focused venture capital, the company has driven significant trading volume with zero paid acquisition and strong retention metrics. The founding team are crypto and onchain veterans with a prior unicorn venture. The platform is a purpose-built execution system for AI agents operating with real capital around the clock — the infrastructure, data pipelines, and runtime are already in production. You are building the intelligence layer on top of it.
The Problem You're Solving
A fleet of autonomous trading agents runs 24/7, generating a continuous stream of decisions and measurable outcomes. Right now those agents are effective but isolated — when one finds a winning pattern, a human has to carry that insight to the rest. Your job is to make that process autonomous: build the system where the fleet learns from itself and improves continuously without human intervention.
What You'll Own
Learning System & RL Loop (~70%)
- Design and implement the pipeline that connects live trade outcomes back to strategy improvement — signal quality, position sizing, timing, risk parameters
- Build the evaluation framework that separates genuine predictive signal from noise across agents, market conditions, and configurations
- Automate the strategy generation and testing cycle — the system should explore new configurations, validate them against real fleet data, and surface deployment candidates
- Detect regime shifts in market conditions and adapt fleet behavior accordingly
- Decompose every trade into its component drivers — signal quality, execution efficiency, exit timing — and wire those attributions back into strategy design
- Manage fleet-level coordination: concentration risk, capital allocation, and the exploration vs. exploitation balance
- Build the telemetry and data capture layer that makes all of the above possible
Model & Inference Infrastructure (~30%)
- Own the build-vs-buy decision on model hosting — evaluate proxied external APIs versus fine-tuned models on owned infrastructure and execute the chosen path
- Determine whether domain-specific training on trading data meaningfully outperforms prompted general-purpose models — then build the pipeline to act on that answer
- Optimize inference for the specific demands of a large autonomous agent fleet: concurrent agents, structured outputs, cost efficiency at scale
- Build the agent telemetry layer capturing every decision, signal score, and evaluation across the fleet
What You Need
- A production closed-loop system — model outputs drove real-world actions, outcomes were measured, and that feedback automatically improved the next decision. Not a batch retrain. Not a dashboard with manual follow-through. A live, wired loop.
- Practical RL or online learning experience — you understand the challenges of learning from real-world feedback rather than static datasets
- Full-stack ML ownership — you build the pipeline, deploy the model, and own the outcome; Python primary, comfortable with Go or TypeScript in production services
- High-stakes sequential decision-making domain experience — finance preferred but not required; robotics, autonomous vehicles, game AI, ad bidding, and supply chain all transfer
Nice to Have
- LLM fine-tuning and open-source model serving in production (vLLM, TGI, PEFT/LoRA)
- Multi-agent system design
- Financial ML — signal generation, execution optimization, portfolio construction
- Onchain or DeFi experience
Interview Process
Fast — target first call to offer within two weeks
- Intro call with founders (60 min) — fit, motivation, your closed-loop experience
- Technical deep-dive (60 min) — open-ended system design, no right answer, evaluating how you think
- Paid trial project (1 week, part-time) if needed — real problem, compensated