Senior Site Reliability Engineer

Apply for this position Please mention DailyRemote when applying
๐Ÿ“…  Posted 14 days ago ๐Ÿ“ worldwide ๐Ÿ’ต $120,000 - $180,000
Before you apply - make sure the job is legit.

Attempting to apply for jobs might take you off this site to a different website not owned by us. Any consequence as a result for attempting to apply for jobs is strictly at your own risk and we assume no liability.

Job Description


As a member of the technical operations SRE team within the Accounts & Identity group, you will carry the responsibility of keeping services and applications on the platform available, resilient, and secure while continually enabling our feature teams to deliver exciting features and experiences to our millions of users worldwide. You will drive, and lead technical initiatives, identify and contribute towards process and technology improvements amplifying experiences for players, operators, and content creators.


  • Support application delivery and operations of internal and public-facing services within AWS cloud environment, ensuring availability, resiliency, scalability, and performance.
  • Facilitate delivery and releases of new services and features to customers while ensuring operational readiness.
  • Pursue operational improvements and toil reduction thru automation and tooling.
  • Improve observability on our platform by implementing robust monitoring and alerting patterns. Develop rich, informative dashboards / reports on applications and services that provide relevant insights and meaningful alerting to reduce MTTD and MTTR.
  • Collaborate with development, platform and security to inspire, implement, and deliver end-to-end system performance, resiliency and security across all services and tools within the platform.
  • Evaluate hosting resource usage and spend and optimize by applying standard methodologies, patterns and technologies such as spot instances and auto-scaling.
  • Participate in rotational on-call support to triage, resolve production incidents and conduct root cause analysis to identify and drive improvements.

Key Qualifications:

  • Build, deploy, operate and support services at scale
  • PASSIONATE(!) desire to automate and improve everything including process improvements, standardizing tools and technologies
  • Excellent problem solving skills that span user experience, system, infrastructure, and networking (TCP/IP).
  • Drive operational and infrastructure requirements that promote availability, reliability, performance, and security at global scale
  • Customer and peer relationship focused with strong interpersonal and communication skills, inspire change across teams and mentor others.

Required Skills:

  • Operating distributed, critical customer-facing services or applications in production at a global scale.
  • In depth understanding of Unix/Linux systems internals and networking
  • Source code management tools (GitHub preferred)
  • Software development experience in one or more of following: Python, Go, Node.js, Java.
  • Building and deploying Infrastructure as Code: CloudFormation/Terraform
  • Building continuous integration and continuous delivery (CICD) pipelines in Jenkins or similar
  • Operating and running Web Application/APIs in AWS cloud infrastructure including managed services such as Lambda, RDS, DynamoDB and Elasticache.
  • AWS systems and network protocols (ie: ALB, R53, API-Gateway, TCP/IP, HTTP/HTTPS, DNS)
  • Delivering production content using CDN technologies (Akamai, Cloudfront)
  • Container technologies and orchestration (ie: Docker, Kubernetes, EKS)
  • Application monitoring tools: DataDog, CloudWatch, Splunk, Grafana
  • Data Reporting & Analytics: SQL, MySQL, Oracle.

Preferred Experience:

  • BS degree in Computer Science, Software Engineering, or related technical area
  • 7+ years professional experience
  • 2+ years AWS Cloud - deploying, tuning and operating Web/API services at scale.