Optura

Sr. Site Reliability Engineer

Posted a month ago

United States

⭐ 10+ years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Architect and operate multi-cloud infrastructure across AWS, GCP, and Azure, focusing on Kubernetes platforms for SaaS and on-prem deployments. Own the end-to-end reliability, observability, and security of the AI platform, including HIPAA-aware services and deployment frameworks.

Optura is healthcare’s AI orchestration platform. We help healthcare organizations transform disconnected AI pilots into a unified, enterprise-scale program that delivers measurable value. Our platform enables teams to design, execute, and monitor intelligent agents that drive automation, insights, and action, while providing the control and observability needed to scale safely. Built for real-world complexity, Optura supports multiple model providers, integrates seamlessly with existing infrastructure, and offers both SaaS and self-hosted options. Our mission: revolutionize how healthcare deploys and operationalizes AI in production.

We’re looking for a Senior Platform Engineer to design, build, and operate the core services that power Optura’s AI Platform. In this role, you will own systems end-to-end. From model and agent orchestration to routing, reliability, and observability. You will partner closely with product and application teams to deliver secure, scalable, HIPAA-aware services. You will play a critical role in shaping the foundation that enables customers to safely deploy AI in real-world healthcare environments.

Location:

Open to remote or San Francisco Bay Area, Nashville Metro Area, or Raleigh, NC Area

What you'll do

Architect and own Optura's multi-cloud infrastructure across AWS, GCP, and Azure — provisioning, networking, identity, observability, and cost governance
Design and operate Kubernetes platforms that run consistently across our cloud environments and inside customer environments, including BYOC and on-prem (potentially air-gapped) deployments
Build a unified deployment framework so Optura ships the same product to SaaS, BYOC, and on-prem customers without bespoke per-customer engineering — Helm charts, operators, install/upgrade tooling, and release pipelines
Own SLOs, capacity planning, incident response, and postmortems across the entire infrastructure stack; set the bar for operational readiness
Drive reliability and performance through error budgets, chaos testing, latency optimization, and disciplined runbook quality
Harden the platform for regulated deployments — HIPAA controls, tenant isolation, audit logging, RBAC, KMS, and secrets rotation
Lead the build-out of IaC, GitOps, and progressive delivery (Terraform, Argo CD, Crossplane) as the team's standard
Partner with engineering and security to set opinionated guardrails: golden paths, base images, policy-as-code, and CI/CD that the rest of the org adopts by default

What we're looking for

8+ years operating production infrastructure, including 3+ years in a senior SRE, platform, or staff infrastructure role
Deep Kubernetes expertise across managed (EKS, GKE, AKS) and self-managed/on-prem distributions — not just running it, but operating it at scale across heterogeneous environments
Multi-cloud fluency across AWS, GCP, and Azure, with informed opinions on when to abstract vs. embrace cloud-native primitives
Expert with Terraform (or Pulumi/Crossplane) and GitOps tooling
Experience shipping infrastructure that runs in customer environments — packaging, install/upgrade UX, air-gapped artifacts, support escalation paths
Strong networking, identity, and security fundamentals: VPC design, service mesh, mTLS, OIDC, KMS, secrets management
Production observability ownership (Prometheus, Grafana, OpenTelemetry, distributed tracing) and on-call leadership
A track record of writing real code — Go, Python, or similar — to extend the platform, not just configure it

What we would like to see

Experience shipping HIPAA-regulated workloads, including BYOC or air-gapped customer deployments
Background with enterprise software delivery tooling (Replicated, Cluster API, Talos, Rancher, OpenShift)
Built internal developer platforms (Backstage, golden paths) that measurably reduced lead time for an engineering org
FinOps experience — driving meaningful cloud spend reductions through architecture, not just rightsizing
AI/ML infrastructure exposure: GPU scheduling, model-serving stacks, inference autoscaling
OSS contributions to infrastructure projects, or strong opinions formed running them at scale

Benefits at Optura:

We offer a competitive compensation and benefits package, including:

Health, dental, and vision insurance
Generous paid time off
Opportunities for professional growth and development

Equal Employment Opportunity:

At Optura.AI, we’re not just building a product; we are intentionally building the team, culture, and equity we want to see in the tech world. That starts with recognizing that innovation thrives when diverse perspectives come together. Optura is an Equal Employment Opportunity Employer, period. We actively welcome and celebrate every candidate regardless of their race, color, religion, age, marital status, sex (including pregnancy, childbirth, or related medical condition), sexual orientation, gender identity or gender expression, national origin, veteran or military status, disability (physical or mental), genetic information, or any other protected characteristic.

More than compliance, we are deeply committed to diversity and inclusion because it’s a non-negotiable part of our foundation. We believe a truly diverse and inclusive workplace is the engine for long-term professional growth and competitive business success, directly fueling our mission to innovate. As part of the Optura team, your voice will be heard, your contributions will directly matter to our trajectory, and your unique background and experiences won't just be celebrated—they will be a vital part of our success. Let's build something exceptional, together.

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

Optura

Sr. Site Reliability Engineer

AI Summary

What you'll do

What we would like to see

Benefits at Optura:

Equal Employment Opportunity:

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

SaaS Platform Developer

Principal Engineer II

Sightline Software Engineer (IT-Sightline)

Senior Planning Engineer - Tactical Core Planning

Territory Sales Engineer

Senior Software Development Engineer in Test (SDET) – Backbase & Digital Banking

Optura

Sr. Site Reliability Engineer

AI Summary

What you'll do

What we would like to see

Benefits at Optura:

Equal Employment Opportunity:

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

SaaS Platform Developer

Principal Engineer II

Sightline Software Engineer (IT-Sightline)

Senior Planning Engineer - Tactical Core Planning

Territory Sales Engineer

Senior Software Development Engineer in Test (SDET) – Backbase & Digital Banking

Personalize your Remote Job Search in 3 Easy Steps!

  Principal Engineer II