The role involves writing kernels and low-level optimizations to enhance the performance of vLLM as an inference engine. The engineer will collaborate with hardware vendors to maximize performance across various accelerator types.
Inferact
5 Remote Job Openings at Inferact
The role involves optimizing how models execute across diverse hardware and architectures, directly impacting AI inference. The engineer will work at the core of vLLM to push the boundaries of LLM and diffusion model serving.
The cloud orchestration engineer will build the operational backbone for vLLM, focusing on cluster management, deployment automation, and production monitoring. The role involves ensuring that vLLM deployments are observable, debuggable, and recoverable.
The role involves building distributed systems that power inference at a global scale. You will design and implement foundational layers to enable vLLM to serve models across thousands of accelerators with minimal latency and maximum reliability.
Member of Technical Staff, Exceptional Generalist (Remote)
Inferact
·
Full Time
·
4 months ago
Inferact
You will work across the entire vLLM stack, optimizing CUDA kernels, designing distributed orchestration systems, and implementing new model architectures. Your work will directly impact how the world runs AI inference.