High Performance Computing Engineer - Linux kernel

 Posted 2 months ago
     
⭐ 5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

You will optimize large-scale GPU clusters and high-performance computing infrastructure to maximize computational throughput for AI workloads. The role involves a balance of operational troubleshooting and architectural design projects to maintain global HPC systems.
The organization

Our client is a rapidly growing organization at the forefront of the AI revolution, specializing in providing high-performance computing infrastructure to run heavy LLM models and AI products. They operate a global network of data centers with capacity specifically designed and tailored for extreme-scale computational workloads.

The role

We are seeking an experienced HPC Engineer to join a dedicated high-performance computing optimization team. This team sits at the intersection of R&D, hardware engineering, and distributed systems, focusing on maximizing computational throughput and efficiency rather than traditional system administration. You'll work with cutting-edge HPC technology to optimize parallel computing environments, GPU clusters, and interconnect systems, meeting the demanding requirements of AI and machine learning workloads.

You will focus on optimizing the performance of large-scale GPU clusters, targeting latency reduction, computational efficiency, and enhanced parallel processing capabilities. Working with InfiniBand networks and high-performance computing infrastructure, you'll collaborate with cross-functional teams to deliver scalable HPC solutions for client needs.

The role requires balancing operational optimization and troubleshooting (50%) with HPC architecture design and performance tuning projects (50%). You'll maintain and optimize distributed computing systems, managing over 100,000 GPUs across 10+ InfiniBand networks, while ensuring the optimal performance of global HPC infrastructure and driving continuous computational improvements.

What we're looking for
  • 5+ years of experience in HPC environments and parallel computing systems.

  • Strong proficiency in Linux Kernel optimization for HPC workloads.

  • Proficient with tools for profiling & tuning (kernel-space; for example perf, ftrace, eBPF, etc.)

  • Strong proficiency in C++ or C development for high-performance applications.

  • Experience with Golang and/or Python for HPC tooling and automation.

  • Experience with InfiniBand networking and high-speed interconnects.

  • Experience with distributed computing architectures and cluster management.

What's offered
  • Salary: up to 160k + 25% bonus (200k OTE).

  • Flexible working arrangements.

  • A dynamic and collaborative work environment that values initiative and innovation.

  • Location: Amsterdam or full-remote from anywhere within the EU/EER

Similar Jobs

See all Remote Software Development jobs β†’

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified