5C Data Centers USA, Inc.

Senior Manager, AI Clusture Deployment

Posted 6 hours ago

United States

$145K - $175K per year

⭐ 10+ years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Lead the planning, deployment, and operational readiness of large-scale AI infrastructure and GPU clusters. Oversee end-to-end integration including hardware rack-and-stack, networking fabrics, storage optimization, and provisioning automation.

Industry: Hyperscale and AI Data Center and Cloud Computing

Location: Remote (US, Pacific Time Zone)

Employment Type: Full-Time

Reporting to: VP, Operations

POSITION SUMMARY

We are seeking an experienced Senior Manager of AI Cluster Deployment to lead the planning, deployment, integration, and operational readiness of large-scale AI infrastructure environments. This role is responsible for delivering production-grade GPU clusters that support AI training, inference, and high-performance computing workloads across cloud, hybrid, and on-premises environments.

The ideal candidate brings deep technical expertise in GPU infrastructure, networking, storage, automation, and datacenter deployment, combined with strong program leadership and cross-functional execution skills. This leader will oversee end-to-end AI cluster deployment initiatives, including hardware integration, rack-and-stack operations, provisioning automation, performance validation, and operational handoff.

The role requires hands-on familiarity with modern AI infrastructure tooling and architectures, including Canonical MaaS, VAST Data storage platforms, and both InfiniBand and Ethernet-based GPU networking fabrics.

KEY RESPONSIBILITIES

AI and GPU Cluster Deployment & Delivery

· Oversee and partake in deployment and integration of GPU-based compute platforms from NVIDIA and other accelerator vendors

· Lead and participate in end-to-end logical deployment of large-scale AI and GPU clusters in state of the art datacenters.

· Manage deployment programs spanning compute, storage, networking, power, cooling, and automation layers.

· Participate in cluster architecture review for AI training, inference and distributed compute workloads

· Coordinate rack-and-stack and cabling sequencing, network deployment, burn-in testing, and cluster validation.Validate deployment readiness, topology consistency, GPU fabric performance, acceptance testing, and operational turnover processes.

· Establish repeatable and documented deployment methodologies and scalable operational standards.

Networking & Fabric Management

· Lead deployment and operational validation of high-performance GPU interconnects using InfiniBand and Ethernet GPU fabric architectures

· Ensure proper implementation of: spile-leaf architectures, RDMA, network telemetry and performance tuning

· Coordinate closely with network engineering teams on topology implementation and performance optimization.

Storage & Data Infrastructure

· Coordinate with storage engineering teams on deployment and integration of high-performance storage environments supporting AI workloads.

· Ensure successful implementation and operational optimization of data storage platforms

· Validate storage throughput, latency, and GPU data delivery performance.

Automation & Provisioning

· Lead infrastructure automation initiatives for cluster provisioning and lifecycle management.

· Manage deployment tooling and orchestration platforms including:

o Infrastructure-as-Code frameworks

o Automated imaging and provisioning systems (e.g. Canonical MaaS)

o Cluster monitoring and observability tools

· Drive standardization and deployment automation to improve speed, reliability, and repeatability.

Leadership & Program Management

· Build and lead high-performing technical deployment and infrastructure engineering teams.

· Partner with datacenter operations, hardware vendors, networking teams, and AI platform engineering groups.

· Establish strong Project Management Office (PMO) partnership while driving consistent, accurate project updates across the team and systems (e.g. Jira)

· Develop operational procedures, documentation, and deployment best practices.

· Mentor engineers and technical leads across infrastructure domains.

QUALIFICATIONS REQUIRED

· Bachelor's degree in Computer Science, Engineering, Information Technology, or related field (or equivalent experience).

· 10+ years of infrastructure engineering or datacenter deployment experience.

· 5+ years leading deployment or operations teams supporting large-scale AI, HPC, or GPU infrastructure.

· Hands-on experience deploying and operating large GPU clusters in enterprise or hyperscale environments.

· Strong expertise with:

o Canonical MaaS

o Data storage platforms

o InfiniBand and Ethernet GPU fabrics

o Network architecture

o Linux systems administration

o GPU server architectures

· Strong understanding of:

o RDMA and RoCE networking

o High-performance storage architectures

o Cluster automation and provisioning

o Datacenter infrastructure operations

· Proven ability to manage complex cross-functional infrastructure deployment programs.

Preferred Qualifications

· Experience deploying NVIDIA DGX SuperPOD or similar AI infrastructure solutions.

· Familiarity with:

o NVIDIA networking technologies

o Spectrum-X or Quantum platforms

o AI model training infrastructure

o Liquid cooling environments

o DCIM and observability platforms

· Experience in hyperscale, cloud, or AI infrastructure environments.

· Certifications in networking, Linux, Kubernetes, or cloud infrastructure are a plus.

Key Competencies

· Technical leadership

· Infrastructure architecture

· Program execution

· Cross-functional collaboration

· Vendor and stakeholder management

· Problem-solving under operational pressure

· Process improvement and automation

· Excellent communication and documentation skills

Example Titles

Depending on organizational structure, this role may also align with:

· Senior Manager, AI Infrastructure Deployment

· Senior Manager, GPU Cluster Operations

· Director, AI Infrastructure Engineering

· Senior Manager, HPC & AI Platforms

· AI Datacenter Deployment Manager

5C Data Centers is an equal opportunity employer. We evaluate all qualified applicants without regard to race, religion, gender, age, national origin, disability, sexual orientation, veteran status, or other protected status.

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

5C Data Centers USA, Inc.

Senior Manager, AI Clusture Deployment

AI Summary

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

Enterprise Sales Account Manager – Data Centers & Critical Infrastructure

Neuroscience Specialist, CNS- Minneapolis N MN (Minneapolis MN, Saint Paul MN, Saint Cloud MN)

Facilitator- Data Analytics

Senior Packaging Engineer

Neuroscience LTC Specialist - Little Rock, AR (Shreveport, LA, Monroe, LA)

Lead Embedded Firmware Engineer (IoT) - Contractor

5C Data Centers USA, Inc.

Senior Manager, AI Clusture Deployment

AI Summary

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

Enterprise Sales Account Manager – Data Centers & Critical Infrastructure

Neuroscience Specialist, CNS- Minneapolis N MN (Minneapolis MN, Saint Paul MN, Saint Cloud MN)

Facilitator- Data Analytics

Senior Packaging Engineer

Neuroscience LTC Specialist - Little Rock, AR (Shreveport, LA, Monroe, LA)

Lead Embedded Firmware Engineer (IoT) - Contractor

Personalize your Remote Job Search in 3 Easy Steps!