Rime Labs

Machine Learning Engineer, Inference

Posted a month ago

United States

⭐ 5-10 years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Own the production inference stack for voice AI, focusing on model compilation, kernel optimization, and latency reduction. Build reliable, high-throughput speech systems across cloud, on-prem, and heterogeneous environments.

Machine Learning Engineer, Inference

Rime builds voice AI for enterprises running customer experiences at scale. Our text-to-speech models are purpose-built for high-volume conversational deployments, engineered for the pronunciation accuracy, latency, and deployment flexibility that production environments actually demand.

We started from a different premise than the rest of the field: voice AI isn’t bottlenecked by model architecture. It’s bottlenecked by data. So before we trained a single model, we built our own corpus: full-duplex, studio-quality conversational speech, recorded and annotated by PhD linguists. That’s our moat. It’s also why enterprises pick Rime when pilots need to convert into production.

We’re backed by top-tier investors including Unusual Ventures, and we’ve built a team at the intersection of product, research, and craft. Building voice models is an art. We intend to master it.

Role Overview

We’re hiring a Machine Learning Engineer to own inference for Rime’s models in production. Voice is unforgiving because every millisecond shows up in the conversation. You’ll build the systems that turn our models into the lowest-latency, highest-throughput, most reliable speech systems in the industry.

What You’ll Own

In-house real-time speech-first inference stack: model compilation, kernel optimization, batching strategy, streaming output, the path from checkpoint to first-audio-byte.
Latency systems: TTFB targets across regions, KV cache management, speculative decoding, scheduler design
Deployment flexibility: cloud, on-prem, BYOC (SageMaker, Connect), the packaging and runtime story across heterogeneous environments.
Inference for full- and half-duplex models, including streaming codec encoding and decoding

What We’re Looking For

Strong software engineering fundamentals: Rust, Python, C++/CUDA welcome, distributed systems, comfort across the stack.
Hands-on experience serving ML models at scale in production, ideally for low-latency or streaming workloads.
Deep familiarity with inference engines (vLLM, SGLang), SDKs (TensorRT, ONNX, CUDA Graphs, Triton), etc.
Working knowledge of speech synthesis and/or speech recognition techniques.
Familiarity with multiple speech representations (neural codecs, semantic tokens, mel/STFT) and how they shape inference cost.
Experience optimizing transformer or autoregressive model inference: KV caching, quantization, paged attention, speculative decoding.
Willing to roll up your sleeves on unglamorous performance work — flame graphs, NSight traces, kernel tuning, paired with the agency to build the abstractions so the team doesn’t stay stuck doing it by hand.
Bias toward shipping.

Nice to Have

CUDA kernel authoring or Triton experience.
GPU profiling and microarchitecture intuition (H100, A100, L40S, Blackwell).
Experience with parallel model training infrastructure
Multi-tenant inference scheduling and fairness.
Comfort working close to research teams and influencing model architecture choices for inference-friendliness.

Why Join Rime

Build the inference stack behind a category-defining voice AI company.
Direct collaboration with founders, including a CEO with a Stanford computational linguistics PhD who takes latency as seriously as you do.
The systems you build determine what experiences our customers can deploy.
Meaningful equity upside.
High ownership, high standards, low bureaucracy.

What We Offer

Competitive base + meaningful early-stage equity
Remote-friendly
Visa sponsorship available
Access to a proprietary, full-duplex, studio-quality conversational speech corpus
Compute and tooling to do the work
Direct influence on the future of voice AI

At Rime, we...

Are outliers
Cut through the hype to focus on the craft
Move fast with agency and freedom
Maintain a growth mindset, finding joy in the struggle
Do the right things, knowing that it'll lead to making money

If that sounds like you too, you'll be a great fit for Rime!

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

Rime Labs

Machine Learning Engineer, Inference

AI Summary

Machine Learning Engineer, Inference

Role Overview

What You’ll Own

What We’re Looking For

Nice to Have

Why Join Rime

What We Offer

At Rime, we...

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

Backend Engineer (Node.JS/Nest)

Procure-to-Pay Administrator

Senior PPC Manager | Paid Search & AI Advertising | German-speaking (f/m/d) | Remote

Senior SEA Manager (m/w/d) - Google Ads, PMax & AI Advertising - 100 % remote

Digital Analytics & Tracking Consultant | German- & English-Speaking (m/f/d) | GA4, GTM & eCommerce | 100% Remote | Full- or Part-Time

ICF Incorporated, LLC: Full Stack Developer– Reston, VA

Rime Labs

Machine Learning Engineer, Inference

AI Summary

Machine Learning Engineer, Inference

Role Overview

What You’ll Own

What We’re Looking For

Nice to Have

Why Join Rime

What We Offer

At Rime, we...

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

Backend Engineer (Node.JS/Nest)

Procure-to-Pay Administrator

Senior PPC Manager | Paid Search & AI Advertising | German-speaking (f/m/d) | Remote

Senior SEA Manager (m/w/d) - Google Ads, PMax & AI Advertising - 100 % remote

Digital Analytics & Tracking Consultant | German- & English-Speaking (m/f/d) | GA4, GTM & eCommerce | 100% Remote | Full- or Part-Time

ICF Incorporated, LLC: Full Stack Developer– Reston, VA

Personalize your Remote Job Search in 3 Easy Steps!