Principal Operations Engineer, Hardware — Data Center Operations

 Posted an hour ago
     
 $150K - $250K per year
  
10+ years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Serve as the senior technical authority for the operational hardware fleet across hyperscale AI data centers to ensure reliability and continuous improvement. Lead site assessments, operational audits, and root cause investigations while coordinating between hardware engineering and deployment teams.

About Fluidstack

We exist to make humanity more free. For most of human history, you farmed or you starved. Technology gave people more time for the things they wanted to do, instead of things they had to do. Powerful AI will be the biggest lever for human choice we've ever built - but only if models are aligned with what humanity actually wants. There are groups building AI who don't share these goals. Whoever deploys frontier compute infrastructure fastest will decide whether AI expands human freedom or shrinks it.

We're singularly focused on delivering 10 to 100s of GWs of compute faster than anyone else, rethinking every layer of the stack. We acquire power, design and build data centers, and operate them - with teams spanning hardware and software. Speed and scale are our key differentiators. Come be a part of building civilization-scale infrastructure for AI.


We hire people who care deeply about this problem space. If that is you, please apply!

About the Role

We are seeking a Principal Operations Engineer, Hardware to serve as the most senior technical authority for the operational hardware fleet across our hyperscale AI data center portfolio. AI infrastructure lives and dies on the reliability of the compute itself — this role exists to ensure that the GPU systems, servers, and supporting hardware we deploy at scale are operated, maintained, and continuously improved at the standard the workload demands.

You will operate as the technical arm of senior operations leadership in the field — leading site assessments and operational audits, driving the technical readiness of teams ahead of site activation, reviewing hardware platforms and integration designs from an operational lens, and feeding operational learnings back into the hardware engineering, deployment, and supply chain organizations as we shift toward a productized, repeatable build model. You will be a force multiplier across our site hardware leads, deployment teams, and reliability engineers, and the connective tissue between hardware operations, hardware engineering, network, facilities, and customer-facing teams.

The ideal candidate has spent a career operating hardware at scale — in hyperscale data centers, large HPC environments, or comparable 24/7 infrastructure — and is equally comfortable diagnosing a stubborn boot failure on the floor, leading a fleet-wide root cause investigation, and pushing back on a vendor on a flawed RMA process. Formal engineering credentials are valued but not required — practical depth, judgment under pressure, the ability to teach, and the discipline to keep critical infrastructure running through change are what define this role.

Responsibilities

  • 10+ years of hands-on experience operating mission-critical hardware infrastructure, with at least 5 years as the senior technical voice on a site, campus, or fleet.

  • Data center operations experience strongly preferred; hyperscale, large HPC, cloud, or other mission-critical compute infrastructure experience considered.

  • Deep working command of GPU systems, server platforms, storage infrastructure, firmware lifecycle management, and hardware diagnostics — earned in the field, not from a textbook.

  • Demonstrated ability to author, approve, and execute high-risk MOPs and change records in live production environments.

  • A track record of leading root cause analysis on significant hardware events and driving corrective actions to closure.

  • A track record of holding OEMs, ODMs, service vendors, and deployment partners accountable — you know how to enforce a standard without burning the relationship.

  • Strong written communication: operational health assessments, RCAs, procedure reviews, and design review feedback are second nature.

  • Comfort operating as the senior technical voice across operations, hardware engineering, network, facilities, supply chain, and customer-facing teams.

  • Willingness to travel extensively across the fleet. 50-75%.

Preferred Qualifications

  • Bachelor's degree in Computer Engineering, Electrical Engineering, Computer Science, or related field.

  • Hyperscale or large-scale compute operational experience supporting thousands of servers and accelerator systems.

  • Direct experience operating modern GPU platforms at production scale.

  • Strong working knowledge of Linux administration, hardware management tooling, and production troubleshooting workflows.

  • Experience supporting liquid-cooled compute infrastructure and the operational practices required to maintain it.

  • Experience operating across multiple sites or as part of a global fleet operations function.

  • Experience standing up new sites from deployment handover through steady-state.

  • Experience contributing operational requirements into hardware platform decisions, reference architectures, or productized data center builds.

  • Scripting and automation experience in support of fleet-scale hardware operations.

Salary & Benefits

  • Competitive total compensation package (salary + equity).

  • Retirement or pension plan, in line with local norms.

  • Health, dental, and vision insurance.

  • Generous PTO policy, in line with local norms.

    The base salary range for this position is $150,000 - $250,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options.

We are committed to pay equity and transparency.

Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email, please email careers@fluidstack.io with your resume/CV, the role you've applied for, and the date you submitted your application-- someone from our recruiting team will be in touch.

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified