T3 Operations & Support Specialist — Compute & OS (PID9066)

 Posted 3 days ago
     
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Provide Tier-3 operational ownership for Compute and OS services, focusing on complex incident handling, root cause analysis, and permanent fixes. Ensure system readiness through monitoring, patching, and the automation of operational procedures to improve stability.

This is a remote position.

T3 Operations & Support Specialist — Compute & OS (PID9066)

  • Contract / Freelance
  • Full-time
  • Remote with travel readiness required (Germany)
  • Start: ASAP

About the role

We are working with a long-standing anchor client to source a T3 Operations & Support Specialist (Compute & OS) for a large-scale cloud-native platform programme supporting a major energy transmission operator in Germany. The platform is a service-oriented hybrid cloud environment providing application teams with self-service capabilities to develop, run and operate software products across private and public cloud infrastructure.

In this role you will provide Tier-3 operational ownership for Compute & Operating System services within Local Production (DE), handling complex incidents, deep troubleshooting and root cause analysis, and driving permanent fixes and preventive measures.

What you'll be doing

  • Providing T3 operational ownership for Compute & OS services: handling complex incidents, troubleshooting and RCA, and driving permanent fixes and preventive measures
  • Ensuring compute/OS readiness for releases and changes: monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, and runbooks
  • Executing and improving standard operational procedures through automation to reduce toil and improve MTTR and stability
  • Coordinating with Kubernetes, Data, Network and Storage SMEs to resolve cross-domain production issues
  • Validating deployment artefacts from an operations perspective and enforcing quality assurance measures
  • Monitoring system health, performance metrics and service availability across multi-tenant environments
  • Identifying, analysing and resolving incidents to minimise service disruption, and triggering RCA and corrective actions
  • Implementing monitoring and logging strategies to support audit and compliance requirements
  • Performing routine security scans and remediating identified vulnerabilities


Requirements

What you'll need

  • 5 to 10+ years in IT operations, service delivery or platform operations with demonstrated leadership in mission-critical environments
  • Proven experience implementing and leading Incident, Problem, Change and Release governance in production
  • Hands-on experience with VMware 8 virtualisation
  • Operating Systems: Red Hat Enterprise Linux and Ubuntu
  • OS tooling: Satellite, IPA, Certificate Server
  • ITSM/collaboration tooling: Jira Service Management, Jira, Confluence
  • Fundamental understanding of core operations processes (Incident, Change, Problem management, ITSM) and SRE concepts
  • Experience gathering operational insights from monitoring/observability including SLI/SLA/SLO management and tracking
  • Hands-on experience documenting procedures and enforcing clear runbooks and playbooks
  • Hands-on experience with monitoring and logging tools (e.g. Prometheus, Grafana, Datadog, Mimir, Loki)
  • Understanding of modern platform operations (Kubernetes/containers, automation, observability) sufficient to govern specialists
  • Fluent English and German (C1 minimum in both)

Desirable

  • Experience operating in regulated or high-availability industries (banking, telco, public sector, healthcare)
  • Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management
  • Familiarity with enterprise DevOps toolchains (GitLab, JFrog Artifactory, Backstage, Harness)
  • GitOps and IaC awareness (Terraform/OpenTofu, ArgoCD, Helm)


Benefits

As a freelancer / contractor with us, you will enjoy flexible working hours and the freedom to choose your own projects. Our platform gives you access to exciting projects in various industries and supports you in advancing your career. You'll benefit from competitive pay and a dedicated team to help you with any questions you may have. Work independently and utilise our strong network to achieve your professional goals.

Similar Jobs

See all Remote Support jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Support

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified