Design and manage BMS, DCIM, and telemetry pipelines to monitor power, cooling, and environmental systems for high-density GPU data centers. Develop control strategies for liquid cooling systems and manage technical integrations with colocation providers.
Lambda
3 Remote Job Openings at Lambda
Lead the end-to-end lifecycle of critical incidents impacting AI infrastructure and GPU clusters as the central Incident Commander. Coordinate cross-functional engineering teams to ensure rapid resolution and conduct post-incident reviews to improve systemic reliability.
Serve as the primary technical liaison and strategic advisor for key accounts to drive multi-year technology roadmaps and AI/ML transformation. Lead architectural reviews and orchestrate cross-functional teams to ensure business outcomes and technical success for high-value customers.