Manage full-cycle recruiting for AI software engineering, infrastructure, and product teams. Focus on building a diverse candidate pipeline and improving scalable, equitable interview processes.
Lightning AI
5 Remote Job Openings at Lightning AI
Design and operate large-scale GPU infrastructure platforms to minimize incidents and enable customer features. Collaborate across engineering teams to automate operational workflows and participate in an on-call rotation.
Partner with ML engineers to diagnose and resolve complex distributed systems and infrastructure failures in production environments. Improve platform reliability by identifying recurring patterns and building internal automation and documentation.
This is a general talent community expression of interest rather than a specific role. Candidates are invited to submit their information to be considered for future opportunities across the company's global hubs.
Own and evolve a scalable observability platform for metrics, logs, and traces across GPU-enabled bare-metal infrastructure. Design multi-tenant telemetry pipelines and noise-resistant alerting systems to support both internal operations and external customers.