Site Reliability Engineer (Remote)

Apply for this position Please mention DailyRemote when applying
Posted 12 days ago United States Salary undisclosed
Before you apply - make sure the job is legit.

Attempting to apply for jobs might take you off this site to a different website not owned by us. Any consequence as a result for attempting to apply for jobs is strictly at your own risk and we assume no liability.

Job Description

Site Reliability Engineer (SRE)
Number of Resources: 1
Job Type: Contract
Estimated Duration: 6 month contract to start
Desired Experience: 5-7+ years of experience
Location: 100% remote

About The Role

This individual in this role will build, maintain, and support the IR (Information Retrieval) Platform Infrastructure for our client' s ecommerce platform, performing the following:
  • Evaluate service mesh solutions and create an adoption plan for Search Platform (this project will be the primary focus for this role).
  • Design, build and support the core infrastructure of our search platform.
  • Work cross-functionally with various platform teams, ML teams and product partners to build the next generation of our high availability search platform in the cloud.
  • Build and maintain observability and test tooling - logging, monitoring, distributed tracing, alerting and offline test tools needed for search.
  • Practice continuous learning and agile delivery model to stay informed and focused on our deliverables.
  • Support GKE services and maintenance that includes software upgrades, performance tuning and GKE cluster tuning and optimization.
  • Build GKE Tooling for IR Platform' s test environment and automate deployments.
  • Search Disaster Recovery Planning and Testing for Zonal Failures.


  1. Advanced development experience in Python
    • Another programming language (Java, Scala, Go) is also acceptable; however, the role will mostly utilize Python.
    • The engineer should have full command of the language they choose, and experience developing in that language (beyond only scripting).
    • The engineer will ideally have experience working with CI/CD pipeline, including technologies similar to Jenkins/BuildKite/GitHub Actions

  2. Advanced experience with Kubernetes/Docker.
    • This engineer will lead the team in best practices, technical understanding in Kubernetes, and managing infrastructure at scale
    • This skill should include hands-on cloud provider experience (AWS, GCP, or Azure)

  3. Advanced (Hands-On) Infrastructure Experience at Scale
    • Examples are experience with monitoring, logging, alerting [Grafana/Prometheus], distributive tracing, or security
    • Must include experience with Unix/Linux operating systems and networking stack (e.g., TCP/IP, routing, network topologies, and hardware, SDN)
    • Must include infrastructure at scale (deployed to production with significant traffic)
    • An engineer with multiple consulting experiences in the infrastructure realm is more likely to bring the expertise this team is seeking.

  4. Communication and Collaboration Skills
    • Be able to confidently recommend alternative solutions and the why
    • Be a team player that loves to collaborate
    • Be highly self-driven and able to take on work independently
    • Be a strong collaborator and communicator who leads the engineers around you to grow and learn

The states of California and Colorado are ineligible.