The following states/districts are excluded from this job ad: AK, CA, CO, CT, DC, HI, LA, MA, MN, MO, NE, NV, NH, NJ, NM, NY, ND, OR, PR, RI, VT, WA, WY
Future Need - Actively Interviewing
Location: Remote in any United States jurisdiction not excluded from this job advertisement.
Protect the operational continuity of a platform that Veterans depend on every day. As the Monitoring & Incident Management Manager, you will lead 24/7 monitoring operations, incident response governance, and observability strategy for a mission-critical cloud environment supporting the Department of Veterans Affairs (VA).
Position Description: The Monitoring & Incident Management Manager serves as the lead for all platform and application monitoring, incident detection, response coordination, and operational situational awareness while ensuring production issues are detected proactively, escalated, and resolved.
Minimum/General Experience: 5 years of experience in cloud platform operations, monitoring engineering, and incident management and response operations
Minimum Education: Bachelor's Degree in information technology, computer science or related field
Essential Skills/Qualifications:
- Excellent experience managing enterprise monitoring and incident response operations for complex, mission-critical systems
- Excellent experience with modern monitoring, observability, and incident management practices
- Excellent experience with enterprise monitoring platforms (e.g., Dynatrace, Splunk)
- Excellent experience in dashboard design, alert configuration, and observability best practices
- Excellent knowledge of the four Golden Signals (e.g., latency, error rate, saturation, volume) and incident-free availability measurement across complex distributed systems
- Excellent ability to manage incident response operations including Priority Troubleshooting Calls (PTC) participation, Office of Information & Technology (OI&T) Major Incident Management (MIM) coordination, and executive communication during critical events
- Excellent experience establishing and maintaining actionable alert thresholds, on-call rotation schedules, and escalation procedures for 24/7 coverage
- Above average knowledge of AWS GovCloud monitoring capabilities, CloudWatch, and integration with third-party observability tools in a FedRAMP environment
- Above average ability to produce incident reports including executive summaries, root cause analysis, timeline of events, corrective actions, and lessons learned
- Working knowledge of ServiceNow, Jira-based service request workflows, and Federal incident reporting requirements
- Experience supporting Federal Government programs and enterprise-scale applications operating in cloud or hybrid environments
- Excellent verbal and written communication skills
General Physical Requirements needed to perform the essential functions of this job may vary based on the location of the assignment.
- Assignment Location - Remote
- Sedentary Work - Exerting up to 10 pounds of force occasionally and/or a negligible amount of force frequently or constantly to lift, carry, push, pull or otherwise move objects.
- Typing, communicating, repetitive motions.
- Close visual acuity to prepare and analyze data, view computer monitors and read. May need to view presentation screens and other visual aids in a virtual setting.
- Inside environmental conditions with protection from outside elements.
Security: Active Federal Civilian Public Trust clearance
- U.S. Citizenship or Permanent Resident that has lived in the United States for at least 3 years
Federal Civilian Public Trust Consists of a review of up to but not limited to:
- Covers 10 year period and in some instances lifetime events
- OPM Security Investigations Index (SII)
- DOD Defense Central Investigations Index (DCII)
- National Agency Check (NAC) records
- FBI name check
- FBI fingerprint check
- Credit report check
- Written inquiries to previous employers and references listed on the application for employment
- Potential interviews with the subject, spouse, neighbors, supervisor, coworkers
- Law enforcement check
- Court records check
- Education check - Attendance and Degrees
Acceptable Credentials
Tasks/Activities include, but are not limited to:
- Maintains regular communication with the Contracting Officer's Representative (COR) and Government technical leadership regarding operational health, incident status, and service restoration activities
- Governs all platforms and application monitoring ensuring automated alerts detect production issues prior to user-reported tickets
- Maintains 24/7 active alert monitoring coverage
- Delivers the Capabilities and Services Monitoring Plan defining alert conditions, thresholds, escalation paths, and on-call coverage for all capabilities
- Oversees delivery and maintenance of the Capabilities and Services Dashboard displaying real-time latency, error rate, saturation, volume, and incident-free availability for all services
- Ensures immediate response to critical service requests
- Coordinates and leads all PTC and OI&T Major Incident Management (MIM) events regardless of culpability
- Delivers bi-weekly and ad hoc Incident Report Briefings
- Presents all incident reports and responds to Government questions with qualified subject matter experts (SMEs)
- Maintains a complete, auditable alert log including alerted system, alert description, timestamps, corrective actions, and responsible system
- Coordinates with Site Reliability Engineers (SREs), DevSecOps, and Architecture teams to align monitoring requirements across all tenant environments
Compensation & Benefits: The annual projected pay range for this position is $70,612 - $102,584 with consideration being given to various factors including but not limited to qualifications, experience, job responsibilities, and geographic location.
Oxley Enterprises, Inc. offers a full array of benefits including:
- Medical, dental, vision and prescription drug coverage for you and your family.
- Life Insurance, short-term disability and long-term disability paid for by the Company.
- Supplemental coverages including Accident, Critical Illness, and Hospital.
- Additional Life insurance coverage for you and your dependents.
- 401k plan with various options to select based on your retirement goals.
Oxley Enterprises®, Inc. is a certified service-disabled veteran-owned (SDVOSB), veteran-owned (VOSB), and woman-owned small business (WOSB) that has 26 years of experience building and delivering quality IT systems and programs. Oxley is ranked in the INC 5000 7 times (2016, 2017, 2018, 2021, 2023, 2024, 2025). Oxley is a 2019 - 2025 Department of Labor HIRE Vets Medallion Award Winner. Oxley is Virginia Values Veterans certified.
All qualified applicants will receive consideration for employment without regard to any status protected by applicable federal, state, or local law.
If you require a reasonable accommodation to apply for a position at Oxley Enterprises, Inc., please send an email to our Human Resources Department at: careers@oxleyenterprises.com with the following information:
Subject Line: Accommodation Request
Provide a description of your accommodation request
Include your contact information: Full name, Email address, Best number to reach you (optional)
We participate in the E-Verify program. http://www.dhs.gov/E-Verify