Senior Site Reliability Engineer

 Posted 4 hours ago
     
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Manage AWS infrastructure, Kubernetes platforms, and internal tooling while focusing on automation to reduce manual runbooks. Handle production incidents, observability, and the on-call rotation using tools like Terraform and Go.

Simple Life is the #1 AI-powered health coaching app for adults who want to lose weight and enjoy a healthier lifestyle—without the stress or extremes. Our mission is to empower people to feel their best every day. By challenging traditional, restrictive approaches, Simple offers a more sustainable method grounded in ease, personalization, and real-life support.

Simple has had over 17 million downloads and more than 300,000 5-star reviews, having helped millions lose weight successfully and sustainably. Simple has earned recognition as Best Virtual Coach and one of the Top 100 AI companies — all thanks to a dedicated global team driving real impact.

With SIMPLE as a partner in their pocket, users feel cared for and empowered to embrace — and stick to — new healthy habits. To learn more, visit simple.life.

Simple is looking for a Senior SRE to join our Platform team responsible for the AWS infrastructure, the Kubernetes platform and the internal tooling that the rest of engineering relies on.

Push the pace of innovation and build a future of a healthier world with us!

About the role:

This is an operations-led position. You will be working day to day on AWS, our infrastructure-as-code, our CI/CD setup, observability, and the on-call rotation.

A meaningful part of the work is automation: when we find ourselves doing the same thing twice, we usually invest in tooling rather than writing another runbook. Most of that tooling is written in Go.

To give a sense of the environment: infrastructure is defined with Terraform and Terramate, with Atlantis running plan and apply on pull requests. Workloads run on EKS with Karpenter and Fargate, deployed through ArgoCD. Observability is built on Grafana, Loki, Tempo, and Prometheus compatible metrics.

We’re looking for:

  • The most important quality for this role is how you handle problems that are not yet understood.

  • Production incidents rarely present cleanly: logs can be incomplete, metrics can mislead, and the first plausible theory is often wrong.

  • The right candidate stays focused under that kind of pressure, works through ambiguity in a structured way, and arrives at a real root cause rather than a convenient one.

  • Strong investigation, debugging, and problem-solving instincts are essential.

  • We also expect candidates to learn quickly. The stack and the company both move, and you will regularly be the first person on the team to take on something new.

  • Several years of hands-on experience operating production systems on AWS and Kubernetes, including genuine on-call ownership.

  • A solid working knowledge of AWS fundamentals, including VPC, IAM, EKS, and RDS.

  • Practical experience with Terraform and a GitOps-style delivery workflow (ArgoCD, Atlantis, Flux, or similar).

  • Comfort writing code, with some prior experience in Go or a willingness to pick it up (writing small services and tools is a regular part of the work).

  • Strong written and spoken English, and the communication skills to drive design discussions across engineering, product and security.

Perks and Benefits:

  • Open-minded teams, a welcoming and inclusive company culture, plus the opportunity to make a real difference with a game-changing health tech product.

  • A competitive salary package based on your unique expertise, skillset, and impact on the product plus stock options.

  • In-office, remote and hybrid work opportunities.

  • The equipment whatever you need to be happy and productive.

  • A premium SIMPLE subscription.

  • 21 days annual leave, plus bank holidays (those observed where you live).

  • Flexible hours. We focus on your results, not how long you spend at your desk.

About our values:

  • Think deeper: We understand that in order to grow we need to make all our decisions reality-based and change our opinion based on what we learn. We appreciate data coming in various forms – quantitative and qualitative, feedback from users and colleagues, and strong and weak signals.We treat data as the main source for leveraging insights and expect people at every level to have conversations that start with data.

  • Focus on impact: Results and speed matter. When we are competing to become an A-player in the digital health market, we don’t have the luxury of deliberation. We need to make decisions and changes quickly and, swiftly learn from our mistakes.We prioritize what will have the greatest impact and aren’t distracted by anything else. We create products that benefit users while we are meeting our metrics.

  • Take ownership: We seek to improve all facets of our company even in ways beyond our job description. We seek and take responsibility for our actions and their impact. We value and set high expectations for our own work so that it can add to the overall quality and innovation results of the team. Each one of us is empowered to make this company a success, take the lead to resolve disagreements and systemic issues.

  • Push the limits: We encourage our team to explore new ideas, challenge conventional thinking, and continuously improve work. This mindset can lead to breakthroughs in product development, improved operational efficiency, and increased competitiveness in the market. We believe that a culture and mindset of constantly striving to exceed existing standards, boundaries, or expectations that include innovation, experimentation, and a willingness to take risks, can bring us success. We don't accept what someone says as truth if we disagree with it, no matter what authority that person has in the company and express ourselves directly, not through back channels. We challenge ideas, from policy to product decisions, and always seek to understand the reason behind what we do.

  • Be a Championship Team: As a part of the championship team, you must improve your own performance constantly also know your teammates, their talents and skills and be focused on a common goal and how to achieve it together. We hold each other accountable for our contribution to the shared success or failure, and we constantly look for ways to help our colleagues to improve and for us to perform better as a team. We collaborate within the team in order to compete with challengers in the outside world. We build relationships of trust. We provide our teammates with the autonomy and support they need to deliver their part of the goal.

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Site Reliability Engineer

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified