Job DetailsJob Location: Remote, RomaniaPosition Type: Full TimeTravel Percentage: NoneJob Shift: GraveyardJob Category: Information TechnologyThis job is remote for people located strictly in Romania.
We are looking for a mid‑level Site Reliability Engineer focused on GCP to help us transition from a traditional IT Support model to a modern SRE operating model. You will design and implement our GCP‑based platform (GKE, Terraform, Prometheus, Grafana, GCP Operations Suite) and act as a hands‑on guide for our existing team as we adopt SRE ways of working, with a strong focus on automation and tooling in Python.
Responsibilities:
Maintain GCP infrastructure using Terraform, including GKE clusters, Compute Engine, Cloud Storage, Cloud SQL or other managed databases, VPC networking, load balancers, and Cloud DNS.
Manage and operate Kubernetes workloads on GKE: deployments, services, ingresses, autoscaling, configuration, secrets and cluster upgrades.
Participate in on‑call rotations for GCP services and lead or assist in incident response.
Design and maintain observability for GKE and GCP workloads using Prometheus for metrics collection and Grafana for dashboards and visualization.
Provide advanced production support for business‑critical applications (web and backend services), investigating incidents, performance issues and functional degradations together with development teams.
Use metrics, logs, traces and error reports to triage and debug application issues across multiple services and components.
Maintain and improve runbooks, playbooks and knowledge base articles so recurring production issues can be resolved quickly and consistently.
Analyze incident and ticket trends to propose reliability improvements, automation and changes to application configuration or architecture.
Define and implement SLIs and SLOs based on Prometheus metrics and GCP Operations Suite (Cloud Monitoring/Logging) and configure alerts (in Prometheus Alertmanager, Grafana, or Cloud Monitoring) that focus on real customer impact.
Qualifications
2–5 years experience in SRE, DevOps or platform engineering operating production systems, with strong exposure to GCP.
Solid experience with GKE and containerized applications (deployment strategies, scaling, troubleshooting) in production.
Strong Infrastructure‑as‑Code skills with Terraform for provisioning GCP resources (projects, networks, IAM, GKE, databases, etc.).
Experience with Prometheus and Grafana, including:
- setting up metrics collection (exporters, scraping configs) for applications and infrastructure;
- building and maintaining Grafana dashboards for services, platforms, and SLOs;
- configuring alerts (Alertmanager/ Grafana/ Cloud Monitoring) with appropriate thresholds and routing.
Good knowledge of Linux and Docker, including debugging performance, networking and security issues.
Familiarity with GCP Operations Suite (Cloud Monitoring/ Logging) and how to combine it with Prometheus/ Grafana for a complete observability story.
Understanding of GCP security basics: IAM, service accounts, least‑privilege, network security and Secret Manager.
Experience supporting production applications (web or backend services), including debugging issues across logs, metrics, traces and application‑level errors.
Mentoring and coaching mindset: enjoys guiding colleagues through new tools and practices.
Schedule: 16:00-00:50 Romania time
Cadex Solutions Corporation is a holding company formed by Trivest Partners LP to build the premier provider of commercial order-to-cash management solutions. With a history spanning nearly 100 years, Cadex is uniquely positioned with in-depth experience that builds relationships alongside results. Our team of industry experts brings innovation and data insight, improves your processes with hands-on help, and provides custom solutions based on specific needs. Cadex has approximately 800 employees serving over 1,000 clients across all industries from locations including the United States, Colombia, Brazil, Romania, Italy, India, Singapore, and South Africa.
Since 2019, Cadex has been putting together a strong portfolio of ARM companies, including:
A.G. Adjustments, formed in 1974 and headquartered in Melville, NY
D&S Global Solutions, formed in 1997 and fully remote
ABC-Amega, formed in 1929 and headquartered in Buffalo, NY
TranSubro, formed in 2012 and headquartered in Oceanside, NY
DAL, formed in 1974 and headquartered in Clifton Heights, PA
RCC. formed in 1970 and headquartered in Maple Grove, MN
IRG, formed in 1997 and headquartered in Marlborough, MA
Since our inception in 1997, D&S has been driving innovation in accounts receivable solutions, constantly shaping and expanding beyond anything previously conceived to meet clients’ needs.
Our one of a kind D&S Off-Site Network team delivers the highest level of expertise in an array of languages with unmatched flexibility, clarity, and courtesy. And our experience spans over years, countries and companies of all scopes.
Our solutions are completely customizable, extend beyond any and all expectations, and stem from experience telling us that credit risk comes from any, if not all, aspects of business.
As a result, through our proprietary software, leading-edge technology, and considerable know-how, we work with you to do everything humanly possible to mitigate your credit risk efficiently and effectively, producing an ever-growing set of services we are proud to provide