Own and build the control plane and APIs for an accelerator cloud, managing the lifecycle from rack bring-up to node retirement. Drive reliability and observability while operating the scheduling layer for training and inference workloads.
Material Group
3 Remote Job Openings at Material Group
Design and architect network fabrics and rack layouts to power large-scale AI training and inference. Drive performance benchmarking, telemetry observability, and incident response for thousands of compute nodes.
Technical Program Manager - Data Center Operations
Material Group
·
Full Time
·
7 hours ago
Material Group
Lead end-to-end program management for AI Factories, overseeing the transition from initial setup to steady-state operations. Coordinate across architectural and technology disciplines to manage schedules, governance, and vendor relationships.