AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
ABOUT THE ROLE
We are looking for a Production Support Engineer to monitor and support production systems across a multi-account AWS environment, serving as the front line of a tiered support model for a fintech platform. You will triage incidents, execute runbooks, manage SLA performance, and coordinate with engineering, help desk, and security partners. The role includes on-call rotation and structured post-incident review with a focus on continuous operational improvement.
WHAT YOU WILL DO
- Monitor production systems and respond to alerts across infrastructure, application, and data layers;
- Perform first-level triage on incidents and support requests; escalate to developers with thorough context and diagnostics;
- Execute patching, operational tasks, and documented runbooks;
- Participate in on-call rotation and support scheduled deployments as needed;
- Conduct post-incident reviews and feed lessons back into runbooks and playbooks;
- Identify recurring issues and systemic risks before they escalate;
- Improve documentation and monitoring coverage between active support activities;
- Contribute to operational reporting and SLA dashboards;
- Manage and track SLA performance across all supported services; surface risks proactively;
- Coordinate with Help Desk / Deskside Support partner for production tasks affecting employees;
- Escalate security incidents and vulnerabilities to the vCISO partner per documented procedures.
MUST HAVES
- 3+ years in production support, SRE, NOC, or operations engineering;
- Hands-on AWS experience with EC2/ECS, networking (VPC, security groups, ACLs), and IAM;
- Operational proficiency with PostgreSQL and / or Amazon RDS;
- Incident triage across infrastructure and application layers;
- Track record managing SLAs in a ticketed support environment such as Jira;
- Strong written communication for escalation and post-incident reporting;
- Upper-intermediate English level.
NICE TO HAVES
- Experience with structured incident response such as ITIL or NIST;
- Familiarity with Datadog, CloudWatch, or comparable observability platforms;
- Exposure to AWS data services including Glue, S3, Athena, and EventBridge;
- Basic IaC familiarity with CloudFormation, SAM, or Terraform;
- Background in financial services or regulated environments;
- AWS certification such as SysOps Administrator or Solutions Architect;
- Experience with scripting/automation to reduce manual toil.
PERKS AND BENEFITS
- Professional growth: Mentorship, TechTalks, and personalized growth roadmaps.
- Competitive compensation: USD-based pay with education, fitness, and team activity budgets.
- Exciting projects: Modern solutions with Fortune 500 and top product companies.
- Flextime: Flexible schedule with remote and office options.
Meet Our Recruitment Process
Application → Coding Challenge → Video Interview → Technical Interview or Hiring Manager Interview
Each step helps us understand your skills and overall fit.
If it’s a match, you’ll receive an offer.