Please mention DailyRemote when applying
Our client is a rapidly growing organization at the forefront of the AI revolution, specializing in providing high-performance computing infrastructure to run heavy LLM models and AI products. They operate a global network of data centers with capacity specifically designed and tailored for extreme-scale computational workloads.
We are seeking an experienced HPC Engineer to join a dedicated high-performance computing optimization team. This team sits at the intersection of R&D, hardware engineering, and distributed systems, focusing on maximizing computational throughput and efficiency rather than traditional system administration. You'll work with cutting-edge HPC technology to optimize parallel computing environments, GPU clusters, and interconnect systems, meeting the demanding requirements of AI and machine learning workloads.
You will focus on optimizing the performance of large-scale GPU clusters, targeting latency reduction, computational efficiency, and enhanced parallel processing capabilities. Working with InfiniBand networks and high-performance computing infrastructure, you'll collaborate with cross-functional teams to deliver scalable HPC solutions for client needs.
The role requires balancing operational optimization and troubleshooting (50%) with HPC architecture design and performance tuning projects (50%). You'll maintain and optimize distributed computing systems, managing over 100,000 GPUs across 10+ InfiniBand networks, while ensuring the optimal performance of global HPC infrastructure and driving continuous computational improvements.
5+ years of experience in HPC environments and parallel computing systems.
Strong proficiency in Linux Kernel optimization for HPC workloads.
Proficient with tools for profiling & tuning (kernel-space; for example perf, ftrace, eBPF, etc.)
Strong proficiency in C++ or C development for high-performance applications.
Experience with Golang and/or Python for HPC tooling and automation.
Experience with InfiniBand networking and high-speed interconnects.
Experience with distributed computing architectures and cluster management.
Salary: up to 160k + 25% bonus (200k OTE).
Flexible working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.
Location: Amsterdam or full-remote from anywhere within the EU/EER
Stop the endless job search. Our AI finds and applies to the best jobs for you.
Discover remote opportunities in Software Development
Answer easy questions
200,000+ jobs across 15+ categories
Get your best job matches
Only hand-screened, legit jobs
Find a remote job faster
No ads, scams, or junk
“ I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!