Site Reliability Engineer - CTJ - Secret

 Posted 2 hours ago
     
 $102K - $219K per year
  
2-5 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Own the reliability and operational health of Microsoft Substrate components in highly regulated environments. Design automation to reduce toil and lead post-incident reviews to implement durable fixes.
Overview

Microsoft Substrate is the foundational cloud platform that powers many of Microsoft’s most critical services including Exchange Online and M365 Copilot, providing shared infrastructure, identity, messaging, storage, and service-to-service capabilities used across Microsoft 365 and related cloud offerings. Substrate services operate at global scale and are designed to deliver high availability, reliability, and security for some of the world’s most demanding workloads. 

As a Site Reliability Engineer II, you will take ownership of reliability and operational outcomes for specific components or services. You will independently diagnose and resolve production issues, design and implement automation to reduce toil, and contribute to service improvements that enhance availability, scalability, and efficiency. 

This role requires deeper technical judgment, stronger software engineering fundamentals, and close collaboration with partner teams to ensure reliability, diagnosability, security, and compliance are built into services from design through operation—particularly for services operating in highly-regulated environments. 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 



Responsibilities
  • Own reliability and operational health for one or more Substrate components or services in highly regulated environments. 

  • Serve as an actively engaged on-call engineer (OCE), participating in an on-call rotation and independently responding to incidents for owned services. 

  • Respond to, diagnose, and resolve production incidents with minimal supervision. 

  • Design and implement automation to reduce operational toil and improve service stability. 

  • Develop and maintain monitoring, alerting, and telemetry to support SLOs and operational metrics. 

  • Lead post-incident reviews for owned incidents, focusing on root cause analysis and durable fixes. 

  • Collaborate with software engineering teams to embed reliability and operability into service design. 

  • Write and maintain production-quality code and automation that improves reliability, scalability, and operational efficiency. 



Qualifications

Required Qualifications: 

  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
    • OR equivalent experience.
  • 4+ years technical experience in software engineering, network engineering, or systems administration.

Other Requirements: 

Security Clearance Requirements
Candidates must be able to meet Microsoft, customer and/or government security screening requirements required for this role. These requirements include, but are not limited to, the following specialized security screenings: 

This role requires access to Microsoft Government cloud environments, including GCC Moderate (GCCM), GCC High (GCCH), and Department of Defense (DoD) environments. As a result, the successful candidate must be able to obtain and maintain the appropriate background investigations and customer screenings required for access to these environments. 

For access to GCCH and DoD environments, this role requires the ability to obtain and maintain a favorably adjudicated Tier 3 (T3) background investigation.

For access to GCCM environments, this role requires the ability to meet Criminal Justice Information Services (CJIS) eligibility requirements.

For manager-level roles, a Tier 5 (T5) background investigation is preferred, as certain approval authorities and operational responsibilities require this level of screening. 

Candidates may be considered without currently holding these background investigations, provided they are eligible for and able to successfully obtain them. Candidates may begin work while required background investigations are in progress; however, failure to obtain or maintain the appropriate clearance and/or customer screening requirements may result in employment action up to and including termination. 

Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: 
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

  • Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.
    2+ years technical experience working with large-scale cloud or distributed systems.


Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $102,100.00 - $202,200.00 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $133,800.00 - $219,200.00 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay


This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.




Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Site Reliability Engineer

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified