ML Infrastructure Engineer, Forward-Deployed

 Posted a month ago
     
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Collaborate with strategic GPU customers to optimize training and inference workloads on the Verda platform. Contribute to the development of internal ML platform features on Kubernetes, including job scheduling and workflow orchestration.

Imagine a future where anyone can train and run large-scale AI workloads instantly - without worrying about infrastructure bottlenecks.

At Verda, we’re building a fully featured European cloud computing platform designed for high-performance AI workloads. Our mission is to make powerful compute accessible, scalable, and efficient for the teams building the future of AI.

We’re ambitious, curious, and pragmatic builders. We operate with low hierarchy, high ownership, and a strong bias for action. We’ve already achieved a lot, but we’re just getting started.

Now it’s your chance to join the ride. Join Verda while it’s still being built - not once it’s finished!

Your responsibilities

In this role, you will work closely with strategic GPU customers, embedding directly with their teams to help get training and inference workloads running efficiently on Verda. You will collaborate with ML engineers and researchers to troubleshoot, optimize, and guide them in getting the most out of our infrastructure.

At the same time, you will contribute to building and improving our internal ML platform on Kubernetes, including job scheduling, workflow orchestration, and training infrastructure. You will also help evolve our inference stack, working on model packaging, serving frameworks, and performance optimization.

A key part of your role will be translating customer needs into scalable platform features, helping prioritize what we build to serve the broadest set of users. You will work closely with infrastructure and engineering teams to continuously improve performance, reliability, and developer experience across our platform.

Your key competencies

  • Strong ML engineering background with hands-on experience training, fine-tuning, or optimizing models at scale

  • Proficiency with PyTorch (JAX is a plus)

  • Experience with software or infrastructure engineering, including CI/CD or GitOps workflows

  • Strong programming skills in Python (additional languages such as Rust are a plus)

  • Comfortable working in Linux environments, including debugging GPU performance issues (CUDA, drivers, networking, filesystems)

  • Experience working directly with customers or stakeholders, with the ability to guide, collaborate, and challenge when needed

  • Ability and willingness to travel to customer sites when needed

Nice to have

  • Experience with Kubernetes (operators, CRDs, job scheduling, GPU scheduling)

  • Familiarity with systems such as Kueue, Flyte, Ray, or Slurm

  • Experience deploying inference workloads using vLLM, SGLang, TensorRT-LLM, or Triton

  • Knowledge of GPU networking and performance tuning (e.g., InfiniBand, NVLink, NCCL)

  • Research background (PhD or equivalent)

  • Experience in forward-deployed, solutions engineering or consulting roles

Why Verda

  • Cash + equity compensation along with various fringe benefits

  • Profitable operations with rapid, sustained growth

  • 31 nationalities, with 6 different ones on the management team

  • An opportunity to work at the intersection of infrastructure and cutting-edge AI workloads, collaborating directly with leading ML teams

Practicalities

Location: Helsinki (hybrid) or remote in Europe

Employment type: Full-time and permanent

What's next

We’re building fast and this role needs the right person behind it. There’s no artificial deadline, but when we find who we’re looking for, we move.

If this sounds like your next move, apply now.

Please submit your application through our Careers page. We don’t accept applications sent by email.

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified