Requisition Number: 75819
The company built on breakthroughs.
Join us.
Corning is one of the world’s leading innovators in glass, ceramic, and materials science. From the depths of the ocean to the farthest reaches of space, our technologies push the boundaries of what’s possible.
How do we do this? With our people. They break through limitations and expectations – not once in a career, but every day. They help move our company, and the world, forward.
At Corning, there are endless possibilities for making an impact. You can help connect the unconnected, drive the future of automobiles, transform at-home entertainment, and ensure the delivery of lifesaving medicines. And so much more.
Come break through with us.
Corning’s businesses are ever-evolving to best serve our customers, industries, and consumers. Today, we accelerate and transform life sciences, mobile consumer electronics, optical communications, display, automotive, and solar markets. We are changing the world with:
- Trusted products that accelerate drug discovery, development, and delivery to save lives
- Damage-resistant cover glass to enhance the devices that keep us connected
- Optical fiber, wireless technologies, and connectivity solutions to carry information and ideas at the speed of light
- Precision glass for advanced displays to deliver richer experiences
- Auto glass and ceramics to drive cleaner, safer, and smarter transportation
- Solar polysilicon, wafers, and innovative photovoltaic modules, enabling low-cost solar energy solutions
Site Reliability Engineer
We are looking for a Site Reliability Engineer with a strong software development background to join our Scrum team maintaining and improving an Enterprise Generative AI platform. This role focuses on platform stability, performance optimization, uptime, and availability. You will own the reliability posture of a platform serving tens of thousands of enterprise users — from infrastructure provisioning through production observability. You'll work alongside application developers, not above or below them, ensuring what they build can run safely at scale.
- Maintain team CI pipelines and govern code contribution quality rules including branch protection, automated testing gates, and artifact management
- Craft efficient, secure Terraform for supporting infrastructure in AWS
- Diagnose and troubleshoot production incidents — you are the last line of defense before the customer is impacted
- Work with software developers on leveraging cloud-based SaaS or PaaS offerings safely in our product
- Evaluate production readiness of developer features — you have a voice and a vote before code hits production
- Provide comprehensive security review and hardening including secrets management, network policies, and IAM boundaries
- Own and evolve the platform's observability stack — monitoring, alerting, dashboards, and on-call response
- Manage and maintain container orchestration infrastructure (ECS/Kubernetes), including deployments, scaling policies, and health checks
- Maintain and document runbooks for common failure modes so the team isn't dependent on tribal knowledge
- Partner with developers during incident response — drive root cause analysis and ensure follow-through on remediation
- Participate actively in Scrum ceremonies and contribute to sprint planning, estimation, and technical grooming
Required Skills & Experience
- 5+ years of SRE, DevOps, or infrastructure engineering experience deploying and managing cloud infrastructure using IaC tools
- Experience with Terraform including authoring custom modules
- Production AWS experience — you've built and operated infrastructure, not just deployed apps to someone else's
- Experience instrumenting enterprise-grade applications for uptime, performance, and alerting (CloudWatch, Datadog, Prometheus/Grafana, or similar)
- Comfort working inside application codebases—you can read Node.js/Python/Go well enough to diagnose issues and handle straightforward patches when developers are busy, even if you're not the one writing features
- Experience with CI/CD pipeline design and maintenance (GitHub Actions, GitLab CI, Jenkins, or similar)
- Experience leveraging AI coding assistants (Claude Code, Codex, Cursor, etc.) — we use these daily and expect you to as well
- Strong written and verbal communication — you'll be the person explaining to a developer why their feature isn't production-ready yet, and you need to do that with clarity and respect
- Experience operating deployments on ECS or Kubernetes at enterprise scale
- Experience with AWS infrastructure (EC2, ECS, ALB, DocumentDB, CloudWatch)
- Prior experience supporting enterprise platforms with 10,000+ active users
- Familiarity with AI/LLM platform architecture — you don't need to train models, but you should understand what the application is doing so you can support it effectively
- Experience operating PostgreSQL at scale — backups, replication, performance tuning
- Experience with secrets management (Vault, AWS Secrets Manager, Parameter Store)
- Familiarity with container security scanning and vulnerability management
- Experience writing or maintaining incident runbooks and post-mortem processes
- Prior on-call experience with defined SLAs/SLOs
What We Offer
- Autonomy and trust — you'll own the reliability posture, not just execute tickets
- Opportunity to work on a high-impact, enterprise-scale Generative AI platform serving tens of thousands of users
- Direct collaboration with a US-based Product Manager and Scrum team in a fast-paced, high-autonomy environment
- Exposure to cutting-edge AI technologies including multi-model orchestration, agentic AI, MCP, and RAG
- Competitive compensation aligned with market rates for senior engineering talent
- Fully remote position with flexible working hours to accommodate US Eastern Time zone overlap