Corning

Software Engineer - Platform & Reliability

Posted 22 days ago

India

⭐ 5-10 years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Maintain and improve an Enterprise Generative AI platform focusing on stability, performance, and availability. Manage infrastructure provisioning, observability stacks, and CI/CD pipelines while partnering with developers for incident response.

Requisition Number: 75819

The company built on breakthroughs.  
Join us. 

Corning is one of the world’s leading innovators in glass, ceramic, and materials science. From the depths of the ocean to the farthest reaches of space, our technologies push the boundaries of what’s possible.   

How do we do this? With our people. They break through limitations and expectations – not once in a career, but every day. They help move our company, and the world, forward.  

At Corning, there are endless possibilities for making an impact. You can help connect the unconnected, drive the future of automobiles, transform at-home entertainment, and ensure the delivery of lifesaving medicines. And so much more.  

 Come break through with us. 

Corning’s businesses are ever-evolving to best serve our customers, industries, and consumers. Today, we accelerate and transform life sciences, mobile consumer electronics, optical communications, display, automotive, and solar markets. We are changing the world with:

Trusted products that accelerate drug discovery, development, and delivery to save lives
Damage-resistant cover glass to enhance the devices that keep us connected
Optical fiber, wireless technologies, and connectivity solutions to carry information and ideas at the speed of light
Precision glass for advanced displays to deliver richer experiences
Auto glass and ceramics to drive cleaner, safer, and smarter transportation
Solar polysilicon, wafers, and innovative photovoltaic modules, enabling low-cost solar energy solutions

Site Reliability Engineer

About the Role

We are looking for a Site Reliability Engineer with a strong software development background to join our Scrum team maintaining and improving an Enterprise Generative AI platform. This role focuses on platform stability, performance optimization, uptime, and availability. You will own the reliability posture of a platform serving tens of thousands of enterprise users — from infrastructure provisioning through production observability. You'll work alongside application developers, not above or below them, ensuring what they build can run safely at scale.

What You’ll Do

Maintain team CI pipelines and govern code contribution quality rules including branch protection, automated testing gates, and artifact management
Craft efficient, secure Terraform for supporting infrastructure in AWS
Diagnose and troubleshoot production incidents — you are the last line of defense before the customer is impacted
Work with software developers on leveraging cloud-based SaaS or PaaS offerings safely in our product
Evaluate production readiness of developer features — you have a voice and a vote before code hits production
Provide comprehensive security review and hardening including secrets management, network policies, and IAM boundaries
Own and evolve the platform's observability stack — monitoring, alerting, dashboards, and on-call response
Manage and maintain container orchestration infrastructure (ECS/Kubernetes), including deployments, scaling policies, and health checks
Maintain and document runbooks for common failure modes so the team isn't dependent on tribal knowledge
Partner with developers during incident response — drive root cause analysis and ensure follow-through on remediation
Participate actively in Scrum ceremonies and contribute to sprint planning, estimation, and technical grooming

Required Skills & Experience

5+ years of SRE, DevOps, or infrastructure engineering experience deploying and managing cloud infrastructure using IaC tools
Experience with Terraform including authoring custom modules
Production AWS experience — you've built and operated infrastructure, not just deployed apps to someone else's
Experience instrumenting enterprise-grade applications for uptime, performance, and alerting (CloudWatch, Datadog, Prometheus/Grafana, or similar)
Comfort working inside application codebases—you can read Node.js/Python/Go well enough to diagnose issues and handle straightforward patches when developers are busy, even if you're not the one writing features
Experience with CI/CD pipeline design and maintenance (GitHub Actions, GitLab CI, Jenkins, or similar)
Experience leveraging AI coding assistants (Claude Code, Codex, Cursor, etc.) — we use these daily and expect you to as well
Strong written and verbal communication — you'll be the person explaining to a developer why their feature isn't production-ready yet, and you need to do that with clarity and respect

Preferred Qualifications

Experience operating deployments on ECS or Kubernetes at enterprise scale
Experience with AWS infrastructure (EC2, ECS, ALB, DocumentDB, CloudWatch)
Prior experience supporting enterprise platforms with 10,000+ active users
Familiarity with AI/LLM platform architecture — you don't need to train models, but you should understand what the application is doing so you can support it effectively
Experience operating PostgreSQL at scale — backups, replication, performance tuning
Experience with secrets management (Vault, AWS Secrets Manager, Parameter Store)
Familiarity with container security scanning and vulnerability management
Experience writing or maintaining incident runbooks and post-mortem processes
Prior on-call experience with defined SLAs/SLOs

What We Offer

Autonomy and trust — you'll own the reliability posture, not just execute tickets
Opportunity to work on a high-impact, enterprise-scale Generative AI platform serving tens of thousands of users
Direct collaboration with a US-based Product Manager and Scrum team in a fast-paced, high-autonomy environment
Exposure to cutting-edge AI technologies including multi-model orchestration, agentic AI, MCP, and RAG
Competitive compensation aligned with market rates for senior engineering talent
Fully remote position with flexible working hours to accommodate US Eastern Time zone overlap

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

Corning

Software Engineer - Platform & Reliability

AI Summary

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

Civil Engineer - Site Design

Backend Developer (Data Layer / API Specialist)

Lead Product Manager, Enterprise AI & Automation

Senior Web Application Penetration Tester

Sr/Staff Data Scientist (Remote - US)

Product, Platform & Enterprise Full Stack Software Engineer II (Remote - US)

Corning

Software Engineer - Platform & Reliability

AI Summary

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

Civil Engineer - Site Design

Backend Developer (Data Layer / API Specialist)

Lead Product Manager, Enterprise AI & Automation

Senior Web Application Penetration Tester

Sr/Staff Data Scientist (Remote - US)

Product, Platform & Enterprise Full Stack Software Engineer II (Remote - US)

Personalize your Remote Job Search in 3 Easy Steps!