Lead and manage production support operations for AI-driven applications, data platforms, and client-facing solutions. Oversee incident management, system health monitoring, and the establishment of SLAs to ensure stable and reliable service delivery.
This is a remote position.
Production Support & Service Manager
We are seeking a Production Support & Service Manager to lead ongoing support, incident management, and service delivery for AI-driven applications, data platforms, and client-facing solutions. This role will ensure the stability, performance, and reliability of systems operating across AWS, Azure, Tableau, Power BI, and DealCloud CRM.
This individual will play a critical role in establishing and scaling production support operations for a growing AI-focused technology team, ensuring seamless service delivery and rapid issue resolution.
Key Responsibilities
- Lead and manage production support operations for applications, data platforms, and AI solutions
- Oversee incident management, including triage, root cause analysis, and resolution
- Establish and manage SLAs, SLIs, and KPIs for system performance and support responsiveness
- Monitor system health across cloud platforms (AWS, Azure) and data/reporting tools
- Manage support for data pipelines, ETL/ELT processes, and reporting platforms (Tableau, Power BI)
- Oversee support and issue resolution for CRM integrations (DealCloud)
- Implement and maintain monitoring, alerting, and observability frameworks
- Coordinate with engineering, data, and QA teams to ensure smooth handoffs from development to production
- Lead problem management initiatives to identify trends and prevent recurring issues
- Drive continuous improvement of support processes, tools, and documentation
- Manage and mentor a team of support engineers or analysts
- Ensure compliance with security, governance, and operational standards
Requirements
Required Qualifications
- 7+ years of experience in Production Support, Application Support, or IT Service Management
- Experience supporting systems in cloud environments (AWS and/or Azure)
- Strong understanding of incident, problem, and change management processes
- Experience supporting data platforms, ETL pipelines, and reporting tools
- Familiarity with monitoring and observability tools (e.g., Datadog, Splunk, New Relic, or similar)
- Strong experience with SQL and data troubleshooting
- Experience managing or leading support teams or service operations
- Strong communication skills with the ability to interact with both technical and business stakeholders
Key Traits for Success
- Strong leadership with the ability to build and scale support operations
- Calm and decisive under pressure with strong incident management skills
- Proactive mindset focused on preventing issues vs. reacting to them
- Strong analytical and troubleshooting abilities
- Ability to balance technical depth with service delivery excellence