Target is an iconic brand, a Fortune 50 company and one of America's leading retailers.
Target as a tech company? Absolutely. We're the behind-the-scenes powerhouse that fuels Target's passion and commitment to cutting-edge innovation. We anchor every aspect of being America's most loved retailer with cutting edge technology, and the smartest engineers in retail technology! Site Reliability Engineering (SRE) at Target is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
In today's world, Guests' technologies expectations are very, very high. When they login to Target.com, they expect it to be available, performant, and quite simply work. When they choose order pickup, they expect their order to be available for pickup in store the same day in a couple hours. To meet these expectations, Target needs to have confidence that our technologies are meeting the guest's needs and expectations!
As a Sr. Engineering Manager, you will lead a site reliability engineering team in an agile environment building solutions, cultivating a learning culture and developing a diverse and inclusive team of engineers to solve problems which oftentimes have not been solved anywhere. You are someone who has demonstrated experience building highly scalable platforms and fault tolerant systems across a range of technologies, including Linux, Apache, MongoDB, Python, Oracle RDBMS, Redis, Postgres and Hadoop. We use a combination of Google Computing Platform and our own server farms operating out of Target Data Centers, and therefore, experience managing application stacks in a Hybrid cloud is preferred. You are a thought leader and mentor for internal and external technical talent and actively contribute to the external technical community. The key to the success of this position is having strong & innovative approach to problem solving, great technical leadership, excellent communication (written and verbal, formal and informal), flexibility, and a self-motivated working style with attention to detail.Job Summary:
As a senior member of the SRE team, you will contribute to technical architecture, product prioritization, scalable automation, capacity planning, adoption of supporting technologies, and all other aspects of maintaining a world-class cloud-based service. You will work closely with software engineering teams developing infrastructure and applications, and focus on driving scalability, stability, reliability, operability of services, and security for Omnichannel retail experiences.
Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to availability of Target's e-commerce/Retail and Enterprise platforms. Our goal is to build, scale and guard the systems that delight our guests. To do so, you will:
- Design, write and build tools to improve the reliability, latency, availability and scalability of Target's e-commerce/Retail and Enterprise products
- Instrument systems for reliability, performance and efficiency of Target.com experiences
- Define, drive adoption and enforcement of service level objectives at both service and experience levels
- Influence, design and create new architectures, standards and methods for large-scale enterprise systems
- Root-cause complex problems involving multiple parties, networks, hardware and software that relate to scaling and performance
- Champion high availability for critical systems, and systematically root out single points of failure
- Manage Target's 24x7, always-available infrastructure, strive to eliminate downtime and improve the manageability of services
- Set standards for deployments at scale, infrastructure reliability and scalability. Iterate, revisit and optimize service availability, scalability, and performance
- Influence engineering teams across Target with customer focus, world class quality, effective communication, decisive, fast moving solutions, quick and constructive resolution of conflicts
- Manage service availability and scalability through process, tools, and automation. Perform post-mortems and optimize incident response processes
- Lead incident response for production incidents; Drive investigation, analysis and troubleshooting to resolve production incidents and systematically drive down detection and mitigation times.
- Build and lead a team of high caliber site reliability engineers
- Manage and execute against project/agile plans and set deadlines
- Manage cross-product technical dependencies and drive resolutions to conflicts
- Advocate for technologies, frameworks, design patterns, and processes aimed at achieving three nines or higher availability
- 8+ years of engineering (software development) experience. Experience with at least one full cycle implementation from requirement to production. Experience in building/implementing high performance & scalable server-side applications
- 1+ years of managing software development teams with a strong track record of project delivery for large, cross-functional projects
- Experience operating medium to large scale systems
- Experience with test-driven development and software test automation
- Strong sense of ownership
- Strong written and verbal communication skills with the ability to present complex technical information in a clear and concise manner to variety of audiences
- BS or MS in Computer Science or equivalent experience
- Experience building and scaling distributed systems leveraging web scale technologies like Linux, Apache, MongoDB, Python, Oracle RDBMS, Redis, Postgres and Hadoop.
- Experience with Linux/Unix internals and systems services like DNS, DHCP, TFTP, iptables, smtp, as well as networking protocols such as TCP, UDP and HTTP.
- Experience with monitoring systems, tracing and observability to manage large scale systems and 24x7 availability.
Americans with Disabilities Act (ADA)
- Experience with building and maintaining application stacks in a Hybrid Cloud environment, as well as expertise with Google Cloud Platform (Google Cloud Platform) is a plus.
- Programming experience in one or more of the following languages: Go, Java, Python, Ruby, Shell and CI/CD tools such as Travis, Drone, Jenkins.
Target will provide reasonable accommodations (such as a qualified sign language interpreter or other personal assistance) with the application process upon your request as required to comply with applicable laws. If you have a disability and require assistance in this application process, please visit your nearest Target store or Distribution Center or reach out to Guest Services at 1- for additional information. - provided by Dice