Senior Platform Engineer

 Posted 2 hours ago
     
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Build and operate the multi-tenant orchestration and scheduling layer to transform raw GPU infrastructure into a cloud service. Design customer-facing APIs, CLIs, and automation for node provisioning and image management.

Senior Platform Engineer

Platform and software · shared across customers

Reports to: Director, Platform Engineering (or Chief Architect)

Location: Remote (US) or Pleasanton, CA (hybrid)

Department: Cloud Platform Engineering / GPU Platform Engineering

Position summary

The Senior Platform Engineer builds and operates the multi-tenant orchestration, scheduling, and customer-facing platform layer that turns raw GPU infrastructure into a usable cloud service. This role is the software backbone of GPU One (GPUaaS).

Key responsibilities

  • Design and build the orchestration layer (Kubernetes, Slurm, Run:ai, or comparable)

  • Manage multi-tenant isolation including namespaces, networking, storage, and quotas

  • Build customer-facing platform APIs, CLIs, web portals, and SDKs

  • Implement and operate image management, GPU operator, and node provisioning automation

  • Drive infrastructure-as-code and automation across the platform stack

  • Partner with SRE on platform reliability, SLO definition, and observability

  • Support TAM and Support engineers on customer-impacting platform issues

  • Maintain customer environment templates, configuration management, and rollout tooling

  • Participate in architecture review, design discussions, and technical roadmap

  • Drive continuous platform improvement and reduce operational toil

Required qualifications

  • 6+ years in platform engineering, SRE, or cloud engineering at scale

  • Deep Kubernetes expertise including CRDs, operators, and multi-tenant patterns

  • Strong programming skills in Go, Python, or both

  • Experience operating GPU clusters or AI infrastructure at production scale

  • Bachelor's degree in computer science or equivalent experience

Preferred qualifications

  • Experience with NVIDIA GPU Operator, MIG, MPS, and NCCL operator patterns

  • Familiarity with Slurm operator, Run:ai, KubeRay, or comparable AI orchestration

  • Service mesh experience (Istio, Linkerd) and multi-cluster networking

  • Open source contributions in the cloud-native or AI infrastructure ecosystem

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Platform Engineer

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified