Mrsool is seeking a qualified Site Reliability Engineer to join our team. The successful candidate will be responsible for ensuring the stability and reliability of our platform, as well as supporting our development teams in creating and deploying new features. Ideal candidates will have a strong understanding of cloud infrastructure, automation, and monitoring tools, as well as experience working in a fast-paced, high availability environment.
- Collaborate with development teams to design and implement automated deployment and testing pipelines.
- Develop and maintain monitoring and alerting systems to proactively identify and address issues.
- Troubleshoot and escalate production incidents to minimize downtime and improve system reliability.
- Continuously improve our infrastructure and processes to optimize scalability and efficiency.
- Participate in on-call rotations as needed to ensure 24/7 support for our platform.
- Perform routine maintenance and upgrades as needed to keep our systems up to date.
- Contribute to ongoing efforts to improve our security posture and compliance with industry standards.
- Bachelor's degree in Computer Engineering, Computer Science, or related field.
- 3+ years of experience in a similar role, preferably with experience in a high traffic, high availability environment.
- Proficiency in at least one programming language (Python, Ruby, Java, Go, etc.).
- Strong understanding of cloud infrastructure and related technologies (AWS, GCP, Azure, Kubernetes, Docker, etc.).
- Excellent troubleshooting and problem-solving skills.
- Experience with automation and configuration management tools (Chef, Ansible, Puppet, Terraform, etc.).
- Familiarity with monitoring and alerting tools (Prometheus, Grafana, Nagios, etc.).