Lead the team building the optimization platform for LLM inference on Modular Cloud to achieve state-of-the-art performance. Partner with GTM and Engineering teams to translate customer workloads into a technical roadmap for full-stack optimizations.
Modular
9 Remote Job Openings at Modular
Build and scale an optimization platform to drive state-of-the-art LLM inference performance across GPU and ASIC architectures. Collaborate with GTM and engineering teams to tune inference for specific customer use cases and publish industry best practices.
Implement and extend core driver abstractions across diverse hardware backends to ensure efficient execution of MAX and Mojo. Develop multi-accelerator communication primitives and improve diagnostics for asynchronous execution stacks.
Design and develop compiler optimizations to improve inference efficiency across CPUs, GPUs, and ML accelerators. Collaborate with cross-functional teams to implement core technologies for end-to-end performance on heterogeneous hardware platforms.
The role involves defining and scaling the hardware ecosystem strategy by cultivating relationships with major hardware vendors and architects. You will drive partnership models, negotiate commercial agreements, and ensure alignment across engineering, product, and legal teams.
Design and develop Modularβs Quality Strategy and implement an end-to-end full stack quality system. Collaborate with teams to develop testing strategies and monitor the effectiveness of quality-related processes.
The role involves building and shipping an LLM-focused inference platform utilizing advanced techniques like disaggregated inference and multi-node deployment of large models. Responsibilities also include pushing operational excellence through observability, multi-cloud deployments, and clever autoscaling.
The role involves leading the design and optimization of high-performance kernels for large-scale AI inference on GPUs and custom accelerators, owning performance-critical paths and driving architectural decisions. Responsibilities include designing, implementing, and optimizing kernels for various AI workloads, leading optimization efforts across hardware environments, and collaborating with compiler and runtime teams.
The role involves owning the product vision and strategy for Modularβs Cloud Platform, focusing on delivering a high-performance, transparent AI serving system for advanced GenAI developers. This includes defining roadmaps, driving data-informed prioritization, and collaborating across engineering and GTM teams to ensure customer value realization.