Site Reliability Engineer II

 Posted 2 hours ago
     
 $80000 - $124K per year
  
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Provide reliability engineering services through observability, performance engineering, and the definition of service level objectives. Collaborate with cross-functional teams to automate scalable infrastructure and ensure high system availability.

Description

As a Site Reliability Engineer II, your role is to provide reliability engineering services through observability and performance engineering techniques. Using monitoring and performance tools to deliver detailed feedback to product owners and development teams. You will partner with Product Owners to define service level objectives and develop service level indicators. Collaborate with cross-functional teams to design, build, automate, and maintain scalable infrastructure. Your responsibilities will include ensuring high availability, monitoring system performance, and aiding support staff with resolving incidents. This role requires a strong in scripting, cloud platforms, and a passion for optimizing operational efficiency. You will use Site Reliability Engineering practices to deliver a seamless user experience.

⁠⁠⁠⁠⁠⁠⁠Responsibilities:

System Monitoring and Analysis:

• Implement and maintain robust observability solutions to monitor system performance, identifying bottlenecks, and ensuring optimal operation.

• Utilize tools to gather, analyze, and visualize key performance metrics.

Performance Optimization:

• Proactively identify and address performance bottlenecks through in-depth analysis and optimization strategies.

• Work closely with development teams to implement performance improvements and enhance overall system efficiency.

Capacity Planning:

• Conduct capacity planning exercises based on observed patterns and future growth projections.

• Collaborate with infrastructure and development teams to ensure adequate resources are available to meet system demands.

Automation and Scripting:

• Develop and maintain automation scripts for routine tasks, enabling efficient monitoring and response procedures.

• Implement automated processes for scaling and provisioning resources based on observed workload patterns.

Documentation:

• Document system architecture, configurations, and observability best practices to facilitate knowledge transfer and onboarding for team members.

• Keep documentation up-to-date to reflect changes in the system and its monitoring setup.

Collaboration with Development Teams:

• Work closely with software engineers to integrate observability tools into the development lifecycle.

• Provide guidance on building observable systems and assist in instrumenting applications for effective monitoring.

Continuous Improvement:

• Stay informed about industry best practices and emerging technologies related to observability and performance engineering.

• Drive continuous improvement initiatives to enhance the reliability and performance of systems.

• Security and Compliance:

• Collaborate with security teams to implement monitoring and observability measures that align with security requirements and compliance standards.

• Participate in security incident response activities and contribute to ongoing security assessments.

• Training and Knowledge Sharing:

• Conduct training sessions for team members and other stakeholders on observability tools, best practices, and performance engineering concepts.

• Foster a culture of knowledge sharing within the organization.

• And other duties as assigned.

For all roles the preference would be for candidates to sit hybrid out of either our Schaumburg, IL or Secaucus, NJ office, but we are open to candidates that would be 100% remote as well if they do not live locally to either of those locations!

For the Site Reliability Engineer II:

$88,750 - $124,300

NYC

Long Island, NY

Bakersfield, CA

Los Angeles, CA

Sacramento, CA

San Diego, CA

San Juan Cap, CA

West Hills, CA

Alaska

Boston, MA

Fairfield County, CT

Clifton, NJ

Secaucus, NJ

Washington,DC

Worcester, MA

Portland, OR

Denver, CO

Austin, TX

Las Vegas, NV

$80,700 - $113,000

Baltimore, MD

Chantilly, VA

Chicago, IL

Collegeville, PA

Horsham, PA

Norristown, PA

Philadelphia, PA

Portland, ME

Wallingford, CT

Minneapolis, MN

Albany, NY

Atlanta, GA

Dallas, TX

Houston, TX

St. Louis, MO

Florida

$76,650 - $107,350

Buffalo, NY

Cin. & Clev.,OH

Indianapolis, IN

Kansas City, MO

Lenexa, KS

Maine (excl. Portland)

New Orleans, LA

Pittsburgh, PA

Salt Lake City, UT

Syracuse, NY

Troy, MI

Phoenix, AZ

Albuquerque, NM

Billings, MT

Nashville, TN

Columbia, SC

Charlotte, NC

Lexington, KY

Memphis, TN

Oklahoma City,OK

Raleigh, NC

Additional Skills & Qualifications

Qualifications

Required Work Experience:

• 4+ years of experience with multiple APM tools and extensive experience with Dynatrace

• 3+ years SRE experience

• Experience in software development, infrastructure, or operations roles

• Certifications in relevant technologies (e.g. AWS, DevOps, Kubernetes, Dynatrace, Azure, etc.)

• Working experience building CI/CD pipelines and version control systems

• Working experience with scripting languages (e.g. Python, Bash, Go, etc.)

• Excellent problem-solving and communication skills.

• Ability to work collaboratively in a fast-paced, agile environment.

Preferred Work Experience:

• Working experience with Neoload, Jmeter or equivalent performance testing tool.

• Experience executing software load and performance testing in an enterprise environment.

• Experience testing applications hosted in the cloud.

• Experience with infrastructure as code tools such as Terraform or CloudFormation.

• Deep understanding of Linux systems administration and networking principles.

• Experience with containerization and orchestration technologies such as Docker and Kubernetes.

• Experience or familiarity with IIS, HTML, Java, Jboss.

• Experience in Chaos Engineering

• Programming experience using.NET, C, C++, Java, or other popular programming languages. Perl/Python/JavaScript scripting experience may be considered equivalent.

• Terraform and Ansible experience.

• Exposure to Splunk tools.

• Exposure to microservices.

• Dynatrace Certifications

• AWS/Azure/GCP Certifications

• Chaos Engineering Certifications

• Agile Certifications

Knowledge:

• Site Reliability Engineering Principles

• DevSecOps Principles

• Agile (SAFe)

• Healthcare industry

• ITLT

• ServiceNow

• Jira/Confluence

Skills:

• Dynatrace/Prometheus/Grafana

• Neoload/Jmeter

• Splunk

• AWS/Azure/GCP

• SAFe Agile

• Strong communication skills (written/verbal)

• Time management

• Analytic problem solver

• Self-starter

• Result oriented and proven ability in organizing priorities

Experience Level

Expert Level

Job Type & Location

This is a Permanent position based out of Secaucus, NJ.

Pay and Benefits

The pay range for this position is $80000.00 - $124000.00/yr.

o 5% bonus, full benefits, 401K and unlimited PTO

Workplace Type

This is a hybrid position in Secaucus,NJ.

Application Deadline

This position is anticipated to close on Jun 24, 2026.

About TEKsystems

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

About TEKsystems and TEKsystems Global Services

We’re a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We’re a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We’re strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We’re building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com.

The company is an equal opportunity employer and will consider all applications without regard to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

San Francisco Fair Chance Ordinance: Pursuant to the San Francisco Fair Chance Ordinance, for all positions located in the city and county of San Francisco, we will consider for employment qualified applicants with arrest and conviction records.

Massachusetts Lie Detector: It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.

Use of Artificial Intelligence (AI): We may use Artificial Intelligence (AI) to support parts of our hiring process, including sourcing, screening, and evaluating candidates. AI helps assess applications and qualifications, but final decisions are made by our hiring team. By applying, you acknowledge and agree that your application may be reviewed using AI tools.

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Site Reliability Engineer

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified