Senior MLOps Engineer (Vietnam)

 Posted 2 hours ago
     
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Design and manage GPU-accelerated CI/CD pipelines and release engineering for open-source AI infrastructure. Focus on performance gating, security provenance, and optimizing the contributor experience for high-performance AI frameworks.

JOIN US – BUILD THE FUTURE OF AI WITH TENSORMESH.AI FROM VIETNAM!

Tensormesh.ai – một startup AI đình đám tại Mỹ được spinoff từ dự án mã nguồn mở LMCache, đang trên đà định hình lại cách thế giới hiểu và triển khai AI hiệu năng cao – đang chính thức mở rộng và xây dựng team Core Engineering tại Việt Nam! Chúng tôi tin rằng Việt Nam xứng đáng là trung tâm R&D cốt lõi cho khu vực Đông Nam Á, và bạn có thể là một phần quan trọng trong hành trình đó.

We are looking for: MLOps Engineer — LMCache (Open-Source Infrastructure)

1. What You'll Own

- Pipeline architecture: GitHub Actions workflows + self-hosted GPU runner fleet (H100/A100); multi-stage pipeline from lint → unit → GPU integration → cross-framework compat (vLLM/SGLang) → performance regression

- Release engineering: semantic versioning, PyPI publishing, multi-arch container images, Helm charts, Sigstore/cosign signing, coordination with downstream integrators

- Performance gates: Continuous benchmarking that blocks regressions in cache hit rate, TTFT, throughput, memory before merge

- Contributor experience: fast PR feedback, eliminate flakiness, dev containers that don't require expensive GPUs

- Security & IaC: SBOM/SLSA provenance, secret rotation, runner fleet via Terraform with cost-optimized autoscaling.

2. Required

- 4+ years MLOps/DevOps/SRE; 2+ years CI/CD for GPU or ML workloads

- Deep GitHub Actions expertise (workflows, composite actions, self-hosted runners at scale)

- Python packaging & PyPI release flow (incl. wheels with native extensions)

- Docker multi-stage/multi-arch; NVIDIA Container Toolkit

- Terraform/Ansible for cloud GPU infrastructure

- Track record building CI that contributors trust — fast, non-flaky, clear failures

3. Strongly Preferred

Maintainer/contributor experience on a popular OSS project

Familiarity with vLLM, SGLang, NVIDIA Dynamo, KServe, or Triton

Kubernetes in CI (Kind/k3s, multi-node integration tests)

Continuous benchmarking tools + time-series perf tracking

Supply chain security (Sigstore, SLSA, syft/grype)

RDMA / high-perf networking / P2P system testing

Tại sao chọn Tensormesh.ai?

* Làm việc trực tiếp với engineer team tại Mỹ và Việt Nam – sản phẩm bạn build sẽ được dùng bởi các công ty AI hàng đầu thế giới.

* Mức đãi ngộ cạnh tranh toàn cầu.

* Văn hóa engineering-first, không rào cản, không bureaucracy – chỉ có code, impact và learning.

* Linh hoạt remote/hybrid.

Ứng tuyển ngay! Gửi CV + GitHub/LinkedIn về: Hiring@tensormesh.ai hoặc tuan@tensormesh.ai

Hoặc tag ngay người bạn nghĩ "xứng đáng làm core engineer cho một startup AI toàn cầu!"

hashtag

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified