This role involves leading cloud infrastructure, designing scalable AWS architectures, and implementing Site Reliability Engineering (SRE) best practices for operational excellence. Key duties include managing security, optimizing costs, leading incident response, and mentoring the TechOps team.
Sr TechOps and SRE Lead (AWS Cloud) - Remote
Department: Technology / Engineering
Role Overview
We are seeking a highly experienced Sr TechOps and SRE Lead with deep expertise in Cloud to lead our cloud infrastructure, DevOps practices, Site Reliability "Best Practices", and overall operational excellence initiatives. This role is both strategic and hands-on — responsible for designing scalable architectures, improving automation, ensuring system reliability, and leading the TechOps team.
Key Responsibilities
Architect and manage secure, scalable, and highly available infrastructure on AWS.
Design multi-account AWS environments using AWS Organizations.
Implement VPC architecture, IAM policies, networking, and security best practices.
Oversee EC2, ECS/EKS, Lambda, RDS, S3, CloudFront, and related AWS services.
Optimize AWS cost management and resource utilization.
Reliability & Production Operations
Implement Site Reliability Engineering (SRE) best practices.
Define SLIs, SLOs, and error budgets.
Manage monitoring and alerting (CloudWatch, Datadog, Prometheus, Grafana).
Lead incident response, root cause analysis (RCA), and postmortems.
Ensure 24/7 uptime and operational resilience.
Security & Compliance
Implement IAM best practices and least-privilege access controls.
Manage secrets and key management (AWS KMS, Secrets Manager).
Conduct vulnerability management and patching.
Support compliance initiatives (SOC 2, ISO 27001, GDPR as applicable).
Lead disaster recovery planning and backup strategies.
Leadership & Strategy
Lead and mentor a team of DevOps/TechOps engineers.
Establish operational KPIs and performance benchmarks.
Manage on-call rotations and escalation processes.
Collaborate with Engineering, Product, Security, and Data teams.
Contribute to long-term infrastructure strategy and cloud roadmap.
<>Required Qualifications
Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
10+ years in DevOps, Cloud Engineering, or Infrastructure roles.
5+ years leading technical teams.
Strong hands-on experience with AWS services (EC2, EKS, RDS, S3, IAM, VPC, Lambda).
Deep knowledge of networking, Linux systems, and distributed systems.
Experience with Infrastructure-as-Code (Terraform or CloudFormation).
Strong scripting skills (Python, Bash, or similar).
Experience with containerization (Docker) and Kubernetes (EKS preferred).
Key Competencies
Strong architectural thinking
Hands-on technical leadership
Crisis and incident management
Strategic planning and execution
Excellent cross-functional communication
Success Metrics
99.9%+ production uptime
Reduced deployment lead time
Reduced incident frequency and MTTR
Improved cost efficiency
High-performing and scalable TechOps function