The Next Chapter W&S

Solutions Architect - AI / ML - Training & GPU infra

Posted 3 months ago

Netherlands

⭐ 5-10 years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Design and validate production-grade distributed training and large-scale inference architectures on massive GPU clusters. Collaborate with customers to debug, optimize, and scale ML workloads while influencing product roadmaps based on real-world performance requirements.

AI/ML Solutions Architect – Distributed Training & GPU Infrastructure

Company

Join a fast-moving AI infrastructure team working on the cutting edge of large-scale ML workloads. This role is ideal for engineers who enjoy solving deep technical challenges in distributed training, multi-GPU systems, and scalable AI inference infrastructure. You will work directly with AI-focused clients, helping them get the most out of modern GPUs (H100, B200, etc.) and ML frameworks such as PyTorch (and JAX in some environments).

Team & Responsibilities

Work alongside senior AI and infrastructure engineers building large-scale GPU platforms. As part of the customer solutions team, you will:

Design and validate production-grade distributed training (primary) and large-scale inference architectures on large GPU clusters, typically tens to thousands of GPUs
Work hands-on with customers to debug, optimize, and scale ML workloads across multi-node GPU environments
Act as a technical authority on GPU performance, networking, and schedulers, making trade-offs at scale and translating customer needs into concrete platform requirements
Collaborate closely with engineering, product, and R&D to influence roadmap decisions based on real-world ML workloads
This is a hands-on, technical role; you are expected to work directly in customer environments, not only advise at a high level

Required skills and experience

Hands-on experience designing and operating enterprise-scale, production-grade, multi-node GPU workloads for training (7B+ model size) or inference
Strong background in distributed deep learning (PyTorch Distributed, DeepSpeed, ...) on GPU clusters
Deep understanding of GPU architecture and interconnects (H100/A100 class, NVLink, InfiniBand)
Experience with Kubernetes or Slurm
Experience with performance tuning using GPU profiling and monitoring tools

This role is not a fit if your experience is limited to single-node training, high-level AI strategy, or non-production research environments. We are looking for engineers and architects who thrive at the intersection of AI workloads and large-scale infrastructure.

What's offered

Location: Remote from anywhere in Europe

Total compensation up to EUR 250k (base + variable / OTE), depending on level and experience

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

The Next Chapter W&S

🧑‍💻 Employees 2-10 employees 🏢 Industry Staffing and Recruiting

View More Jobs From The Next Chapter W&S

The Next Chapter W&S

Solutions Architect - AI / ML - Training & GPU infra

AI Summary

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

Customer Service Engineer (CT/MRI/PET-CT/X-ray/ANGIO)

Data Business Analyst - Data Analytics (AWS/GCP)

Program Administrator Premium Service Center

Entry Data Analyst

Associate Editor, Artificial Intelligence

Software Development Engineer II, Cloud Platform

The Next Chapter W&S

Solutions Architect - AI / ML - Training & GPU infra

AI Summary

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

Customer Service Engineer (CT/MRI/PET-CT/X-ray/ANGIO)

Data Business Analyst - Data Analytics (AWS/GCP)

Program Administrator Premium Service Center

Entry Data Analyst

Associate Editor, Artificial Intelligence

Software Development Engineer II, Cloud Platform

Personalize your Remote Job Search in 3 Easy Steps!