Please mention DailyRemote when applying
Role Specific Information
Job Description
About the Role
As Senior Reliability Engineer, you will ensure the resilience and availability of Kohl’s systems and applications, collaborate closely with development teams, contribute to architectural designs, conduct risk assessments and design for failure, and implement robust monitoring and failover mechanisms.
What You’ll Do
Drive error budget and Service Level Objective (SLO) adoption across products
Drive incident response efforts, perform root cause analysis and implement preventative measures to enhance system reliability
Establish consistent practices that elevate Kohl’s operational excellence through automation and process improvements
Follow software lifecycle and drive reliability, observability, and efficiency across product teams within an assigned domain
Identify repeated toil and find opportunities for automation and risk reduction
On-call on a rotation to respond to production incidents and conduct blameless retros and root-cause analyses (RCAs) to drive a culture of continuous improvements
Proactively identifies failures before they cause outages using chaos engineering techniques such as edge cases, failure modes and design review
Advise on capacity planning and provide continuous assessments on systems behavior and consumption
Work with product managers to identify and prioritize work for reliability best practices (i.e., leveraging SLIs/SLOs/Error Budgets)
Mentors and assists engineers on the team
Additional tasks may be assigned
What Skills You Have
Required
Bachelor's Degree or equivalent in MIS, Computer Science or related field
4+ years of experience in software development
Strong programming skills in one or more languages (Java, Python, Go or Node.js)
In-depth knowledge of systems architecture, operating system internals and network fundamentals
In-depth knowledge of application design patterns, event-driven architecture, database schemas, and testing strategies
Experience with multi-region application troubleshooting and performance tuning
Working experience with one cloud platform (GCP, AWS, or Azure)
Working experience with monitoring techniques and tools (e.g., CloudWatch, Grafana, Prometheus, OpenTelemetry, Tracing)
Preferred
In-depth knowledge of containerization and container orchestration (e.g., Docker, Kubernetes, Rancher)
Experience with one or more configuration management systems (e.g., Chef, Ansible, Puppet)
Passion for and experience with AI and ML methodologies (MLOps)
Experience writing Infrastructure as code (e.g., Terraform, OpenTofu)
Essential Functions
The requirements listed below are representative of functions you will be required to perform, however you may be required to perform additional functions. Kohl’s may revise this job description at any time. To perform this job successfully, you must be able to perform each essential function satisfactorily. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions, absent undue hardship.
Ability to perform the accountabilities listed in the “What You’ll Do” Section
Ability to comply with dress code requirements
Basic math and reading skills, legible handwriting, and basic computer operation
Ability to maintain prompt and regular attendance and meet scheduling requirements as set by the company
Ability to learn and comply with all company policies, procedures, standards and guidelines
Ability to give direction and to receive, understand and proactively respond to direction from leadership and other company personnel
Ability to work as part of a team and interact effectively and appropriately with others
Ability to maintain composure and work in a fast paced environment while accomplishing multiple tasks within established timeframes
Ability to satisfactorily complete company training programs
Ability to use a personal computer for tasks such as communicating, preparing reports, etc.
Ability to plan, prioritize and monitor activities across business units
Ability to complete or oversee the completion of assigned projects in a timely manner
Stop the endless job search. Our AI finds and applies to the best jobs for you.
Discover remote opportunities in Software Development
Answer easy questions
200,000+ jobs across 15+ categories
Get your best job matches
Only hand-screened, legit jobs
Find a remote job faster
No ads, scams, or junk
“ I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!