Inferact

Member of Technical Staff, Inference

Posted 21 days ago

United States

$200K - $400K per year

⭐ 5-10 years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Optimize the vLLM inference engine to improve the speed and cost of running LLMs and diffusion models. Develop innovations for diverse hardware and architectures, including mixture-of-experts and multimodal models.

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware—a position that took years to build.

About the Role

We're looking for an inference runtime engineer to push the boundaries of what's possible in LLM and diffusion model serving. Models grow larger. Architectures shift: mixture-of-experts, multimodal, agentic. Every breakthrough demands innovations on the inference engine itself. You'll work at the core of vLLM, optimizing how models execute across diverse hardware and architectures. Your work will directly impact how the world runs AI inference.

Skills and Qualifications

Minimum qualifications:

Bachelor's degree or equivalent experience in computer science, engineering, or similar.
Deep understanding of transformer architectures and their variants.
Strong programming skills in Python with experience in PyTorch internals.
Experience with LLM inference systems (vLLM, TensorRT-LLM, SGLang, TGI).
Ability to read and implement model architectures and inference techniques from research papers.
Demonstrate the ability to contribute performant and maintainable code and debug in complex ML codebases.

Preferred qualifications:

Deep understanding of KV-cache memory management, prefix caching, and hybrid model serving.
Familiarity with RL frameworks and algorithms for LLMs.
Experience with multimodal inference (audio/image/video/text).
Contributions to open-source ML or system infrastructure projects.

Bonus points if you have:

Implemented core features in vLLM or other inference engine projects.
Contributed to vLLM integrations (verl, OpenRLHF, Unsloth, LlamaFactory, etc).
Written widely-shared technical blogs or side projects on vLLM or LLM inference.

Logistics

Location: This role is based in San Francisco, California. Will consider remote in the US for exceptional candidates.
Compensation: Depending on background, skills, and experience, the expected annual salary range for this position is $200,000 - $400,000 USD + equity.
Visa sponsorship: We sponsor visas on a case-by-case basis.
Benefits: Inferact offers generous health, dental, and vision benefits as well as 401(k) company match.

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

Inferact

🧑‍💻 Employees 11-50 employees 🏢 Industry Software Development

View More Jobs From Inferact

Inferact

Member of Technical Staff, Inference

AI Summary

About the Role

Skills and Qualifications

Logistics

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

Mid/Senior AI Cinematic Video Editor (Full Remote - Worldwide)

Senior Director Enterprise Forecast Modeling & Insights

Consultant, Professional Services

QA Tester - BaseCamp (Remote)

Staff Engineering Manager (Remote - US)

Service Operations Specialist II - Bilingual (DQ)

Inferact

Member of Technical Staff, Inference

AI Summary

About the Role

Skills and Qualifications

Logistics

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

Mid/Senior AI Cinematic Video Editor (Full Remote - Worldwide)

Senior Director Enterprise Forecast Modeling & Insights

Consultant, Professional Services

QA Tester - BaseCamp (Remote)

Staff Engineering Manager (Remote - US)

Service Operations Specialist II - Bilingual (DQ)

Personalize your Remote Job Search in 3 Easy Steps!