Site Reliability Engineer, IaaS

 Published 13 days ago
Apply Now Please mention DailyRemote when applying

Disclaimer: Before you apply, please make sure the job is legit.

Attempting to apply for jobs might take you off this site to a different website not owned by us. Any consequence as a result for attempting to apply for jobs is strictly at your own risk and we assume no liability.

Algolia is set to enable every company to create world-class Search and Discovery experiences with an API-first approach. Performance and Scalability is at the heart of our mission: we power 1.5 trillion searches a year, for 10K+ customers all over the world. 

If you're a problem solver, able to think outside the box and eager to nurture others and learn from them, then this is your challenge!

The Team

The Infrastructure as a Service (IaaS) team aims at upholding the reliability and scalability we expect from Algolia’s infrastructure for its critical systems and products. Our focus is on enabling teams across Algolia to leverage this infrastructure while keeping it under control through an always increasing level of automation.

The Opportunity

The Site Reliability Engineer position within the Infrastructure As a Service team provides a dynamic opportunity for a professional with foundational experience in maintaining and optimizing scalable infrastructures. This role specifically concentrates on three key areas: server and container hosting, cloud and network expertise and flawless observability.

As a member of the Infrastructure As a Service team, you will play a key role in supporting the reliability and scalability of Algolia’s Search products and core internal services. Your responsibilities will include operating components or features, ensuring proper monitoring and alerting are in place, and assisting in the transition from legacy systems. You will work on planning and accountability for the next quarter, demonstrating independence in problem-solving and minimal reliance on managers and senior team members.

Your role will consist of: 

  • Kubernetes and Cloud Services Management: Help maintain and optimize a fleet of Kubernetes-based architectures and cloud services, enhancing fault tolerance and resource utilization.

  • System Management and Configuration: Continuously improve and refine the infrastructure code and automation that manage our Fleet of several thousand servers, keeping it safe, efficient and reliable.

  • Maintain and Extend our Control Plane: Go beyond our current control plane and turn it into a platform that everyone at Algolia can leverage to build performant, reliable and scalable products.
  • Observability Implementation: Support the development and deployment of observability solutions, providing your team and others with actionable insights to track and enhance system reliability.

  • Collaboration and Problem Solving: Work collaboratively with team members to identify and solve problems, reducing dependence on senior staff for guidance.

  • Process Improvement: Contribute to establishing engineering processes and best practices to ensure high-quality, reliable, and scalable systems.

You might be a fit if you have:

  • Programming Skills: Basic to intermediate knowledge of programming languages such as Python, Ruby or Golang, with an understanding of software craftsmanship.

  • Experience with Linux and Kubernetes: Experience in setting up and managing fleets of Linux servers and Kubernetes-based architectures.

  • Knowledge of Distributed Systems: Exposure to operating distributed systems and understanding their challenges at a basic level.

  • Public Cloud Experience: Familiarity with public cloud providers such as Microsoft Azure, AWS, or GCP.

  • Problem-Solving Skills: Ability to independently identify and solve problems, demonstrating initiative and minimal reliance on senior team members.

  • Communication and Organization Skills: Strong communication and organizational skills to effectively collaborate with team members and stakeholders.

  • 3 years or more of related work experience.

We’re looking for someone who can live our values:

  • GRIT - Problem-solving and perseverance capability in an ever-changing and growing environment

  • TRUST - Willingness to trust our co-workers and to take ownership 

  • CANDOR - Ability to receive and give constructive feedback.

  • CARE - Genuine care about other team members, our clients and the decisions we make in the company.

  • HUMILITY- Aptitude for learning from others, putting ego aside.


Ace Your Job Interview

Read our advice on how to answer the most common interview questions.