CADEX

Site Reliability Engineer

Posted an hour ago

Norway

⭐ 2-5 years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Design and implement a GCP-based platform using GKE and Terraform while transitioning the team to a modern SRE operating model. Maintain infrastructure, manage Kubernetes workloads, and establish observability and reliability standards through SLIs and SLOs.

Job DetailsJob Location: Remote, RomaniaPosition Type: Full TimeTravel Percentage: NoneJob Shift: GraveyardJob Category: Information TechnologyThis job is remote for people located strictly in Romania. We are looking for a mid‑level Site Reliability Engineer focused on GCP to help us transition from a traditional IT Support model to a modern SRE operating model. You will design and implement our GCP‑based platform (GKE, Terraform, Prometheus, Grafana, GCP Operations Suite) and act as a hands‑on guide for our existing team as we adopt SRE ways of working, with a strong focus on automation and tooling in Python. Responsibilities: Maintain GCP infrastructure using Terraform, including GKE clusters, Compute Engine, Cloud Storage, Cloud SQL or other managed databases, VPC networking, load balancers, and Cloud DNS. Manage and operate Kubernetes workloads on GKE: deployments, services, ingresses, autoscaling, configuration, secrets and cluster upgrades. Participate in on‑call rotations for GCP services and lead or assist in incident response. Design and maintain observability for GKE and GCP workloads using Prometheus for metrics collection and Grafana for dashboards and visualization. Provide advanced production support for business‑critical applications (web and backend services), investigating incidents, performance issues and functional degradations together with development teams. Use metrics, logs, traces and error reports to triage and debug application issues across multiple services and components. Maintain and improve runbooks, playbooks and knowledge base articles so recurring production issues can be resolved quickly and consistently. Analyze incident and ticket trends to propose reliability improvements, automation and changes to application configuration or architecture. Define and implement SLIs and SLOs based on Prometheus metrics and GCP Operations Suite (Cloud Monitoring/Logging) and configure alerts (in Prometheus Alertmanager, Grafana, or Cloud Monitoring) that focus on real customer impact. Qualifications 2–5 years experience in SRE, DevOps or platform engineering operating production systems, with strong exposure to GCP. Solid experience with GKE and containerized applications (deployment strategies, scaling, troubleshooting) in production. Strong Infrastructure‑as‑Code skills with Terraform for provisioning GCP resources (projects, networks, IAM, GKE, databases, etc.). Experience with Prometheus and Grafana, including: - setting up metrics collection (exporters, scraping configs) for applications and infrastructure; - building and maintaining Grafana dashboards for services, platforms, and SLOs; - configuring alerts (Alertmanager/ Grafana/ Cloud Monitoring) with appropriate thresholds and routing. Good knowledge of Linux and Docker, including debugging performance, networking and security issues. Familiarity with GCP Operations Suite (Cloud Monitoring/ Logging) and how to combine it with Prometheus/ Grafana for a complete observability story. Understanding of GCP security basics: IAM, service accounts, least‑privilege, network security and Secret Manager. Experience supporting production applications (web or backend services), including debugging issues across logs, metrics, traces and application‑level errors. Mentoring and coaching mindset: enjoys guiding colleagues through new tools and practices. Schedule: 16:00-00:50 Romania time Cadex Solutions Corporation is a holding company formed by Trivest Partners LP to build the premier provider of commercial order-to-cash management solutions. With a history spanning nearly 100 years, Cadex is uniquely positioned with in-depth experience that builds relationships alongside results. Our team of industry experts brings innovation and data insight, improves your processes with hands-on help, and provides custom solutions based on specific needs. Cadex has approximately 800 employees serving over 1,000 clients across all industries from locations including the United States, Colombia, Brazil, Romania, Italy, India, Singapore, and South Africa. Since 2019, Cadex has been putting together a strong portfolio of ARM companies, including: A.G. Adjustments, formed in 1974 and headquartered in Melville, NY D&S Global Solutions, formed in 1997 and fully remote ABC-Amega, formed in 1929 and headquartered in Buffalo, NY TranSubro, formed in 2012 and headquartered in Oceanside, NY DAL, formed in 1974 and headquartered in Clifton Heights, PA RCC. formed in 1970 and headquartered in Maple Grove, MN IRG, formed in 1997 and headquartered in Marlborough, MA Since our inception in 1997, D&S has been driving innovation in accounts receivable solutions, constantly shaping and expanding beyond anything previously conceived to meet clients’ needs. Our one of a kind D&S Off-Site Network team delivers the highest level of expertise in an array of languages with unmatched flexibility, clarity, and courtesy. And our experience spans over years, countries and companies of all scopes. Our solutions are completely customizable, extend beyond any and all expectations, and stem from experience telling us that credit risk comes from any, if not all, aspects of business. As a result, through our proprietary software, leading-edge technology, and considerable know-how, we work with you to do everything humanly possible to mitigate your credit risk efficiently and effectively, producing an ever-growing set of services we are proud to provide

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

CADEX

Site Reliability Engineer

AI Summary

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

Senior UI Engineer (C#/JS/HMI/industrial/edge UI, Ukraine)

Staff, Data Scientist

Senior Ruby on Rails Engineer

Quality Assurance Engineer (Mid-Level) – SAP BTP

Frontend Developer – SAP UI5 / Fiori Elements

Backend Developer – SAP CAP & SAP HANA Cloud

CADEX

Site Reliability Engineer

AI Summary

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

Senior UI Engineer (C#/JS/HMI/industrial/edge UI, Ukraine)

Staff, Data Scientist

Senior Ruby on Rails Engineer

Quality Assurance Engineer (Mid-Level) – SAP BTP

Frontend Developer – SAP UI5 / Fiori Elements

Backend Developer – SAP CAP & SAP HANA Cloud

Personalize your Remote Job Search in 3 Easy Steps!