Manager, Reliability Engineering (Remote)

 Posted 3 hours ago
     
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Lead and mentor a team of reliability engineers to drive operational excellence and embed SRE best practices across the software development lifecycle. Oversee critical incident triage, automation initiatives, and the implementation of robust monitoring and auto-healing practices.

Role Specific Information

Job Description

About the Role

In this role you will lead and mentor a team of reliability engineers to drive operational excellence across Kohl’s distributed systems. You will develop and implement strategies, collaborate closely with engineering teams and ensure SRE best practices are embedded throughout the software development lifecycle. 


What You’ll Do

  • Conduct design reviews, implement robust monitoring and alerting and establish auto-healing practices

  • Provide leadership and guidance during critical incidents to triage, troubleshoot and resolve complex issues

  • Drive comprehensive root cause analysis and follow-through on preventative measures

  • Manage the software lifecycle, driving reliability, observability and efficiency in collaboration with peers across Design, Product Management, and Engineering

  • Lead major automation and toil reduction initiatives, simplifying the ecosystem and reducing risks

  • Set the vision and drive cultural transformation within the team

  • Lead technical initiatives within the team

  • Coach team through empathy and hands-on mentoring

  • Develop and deliver training programs to upskill the team and broaden SRE adoption across the organization

  • Hire, mentor, cultivate and lead a high-performing SRE team aligned with business priorities

  • Additional tasks may be assigned


What Skills You Have

Required

  • Bachelor's Degree or equivalent in MIS, Computer Science or related field

  • 6+ years of experience in software development and 2+ years of progressive leadership experience, mentoring diverse teams

  • Successful transformation of technical leadership into people leadership

  • Advanced in-depth knowledge of application design patterns, event-driven architecture, database schemas and testing strategies

  • Demonstrated knowledge of systems architecture, operating system internals and networking

  • Proven experience with multi-region application troubleshooting and performance tuning

  • Demonstrated experience working with (at least one) cloud platform (GCP, AWS, or Azure) and a hybrid cloud environments 

  • Advanced in-depth knowledge and experience with continuous integration, continuous deployment and test-driven development

  • Strong programming skills in one or more languages (Java, Python, Go or Node.js)

  • Strong leadership skills

Preferred

  • In-depth experience with containerization and container orchestration (e.g., Docker, Kubernetes, Rancher). 

  • Demonstrated experience with one or more configuration management systems (e.g., Chef, Ansible, Puppet)

  • Demonstrated experience with monitoring techniques and tools (e.g., CloudWatch, Grafana, Prometheus, OpenTelemetry, Tracing) 

  • Strong understanding of systems architecture, UNIX internals, networking topologies, multi-cluster applications, multi-tenant platforms and systems/network security

  • Passion for and experience with AI and ML methodologies (MLOps) and how to leverage solutions such as LLMs to automate.

Essential Functions

The requirements listed below are representative of functions you will be required to perform, however you may be required to perform additional functions. Kohl’s may revise this job description at any time. To perform this job successfully, you must be able to perform each essential function satisfactorily. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions, absent undue hardship.

  • Ability to perform the accountabilities listed in the “What You’ll Do” Section

  • Ability to comply with dress code requirements

  • Basic math and reading skills, legible handwriting, and basic computer operation

  • Ability to maintain prompt and regular attendance and meet scheduling requirements as set by the company

  • Ability to learn and comply with all company policies, procedures, standards and guidelines

  • Ability to give direction and to receive, understand and proactively respond to direction from leadership and other company personnel

  • Ability to work as part of a team and interact effectively and appropriately with others

  • Ability to maintain composure and work in a fast paced environment while accomplishing multiple tasks within established timeframes

  • Ability to satisfactorily complete company training programs

  • Ability to use a personal computer for tasks such as communicating, preparing reports, etc.

  • Ability to plan, prioritize and monitor activities across business units

  • Ability to complete or oversee the completion of assigned projects in a timely manner

Similar Jobs

See all Remote Others jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Others

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified