SysOps Engineer – Monitoring & Cloud Operations

 Posted 11 hours ago
     
2-5 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

The role focuses on monitoring infrastructure health and managing system alerts using tools like New Relic and Prometheus. It also involves executing disaster recovery plans, performing root cause analysis, and ensuring high availability for payment systems.

About Tap

Tap Payments is revolutionizing online payments across the MENA region by connecting businesses with simple, unified payment experiences. We need exceptional talent to help us on this journey.


The Technology Team

Our technology team builds the platforms, systems, and payment infrastructure our merchants use to process millions of transactions daily.

This team is building technology solutions to simplify MENA payments regionally and globally for businesses of all sizes.


As a Tapster you will:

  • Monitor infrastructure using tools like New Relic, Prometheus, and Grafana

  • Configure and maintain alerts, dashboards, and service health checks

  • Perform incident management, troubleshooting, and root cause analysis (RCA)

  • Ensure uptime and SLA compliance for all systems

  • Monitor CPU, memory, disk, and system processes

  • Manage OS-level operations (Linux/Windows) including patching and tuning

  • Manage system backups and perform regular restoration validation

  • Execute and validate disaster recovery (DR) plans across environments

  • Perform failover and failback testing for critical services (on-prem

    cloud / multi-region)

  • Coordinate DR drills and simulate outage scenarios

  • Ensure replication health and data consistency (in coordination with DataOps)

  • Maintain and update DR runbooks and incident playbooks

  • Perform capacity planning and performance optimization

  • Maintain logs, metrics, and operational documentation


What you will bring to the party:

  • Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience.

  • Proven experience in Systems Operations, Cloud Operations, Infrastructure Support, Site Reliability Engineering (SRE), or a related role.

  • Strong hands-on experience administering Linux and Windows operating systems.

  • Experience with enterprise monitoring and observability platforms such as New Relic, Prometheus, Grafana, Datadog, or similar tools.

  • Solid understanding of incident management, problem management, and root cause analysis methodologies.

  • Experience supporting cloud platforms such as AWS, Azure, or Google Cloud Platform.

  • Strong knowledge of backup, disaster recovery, business continuity, and failover processes.

  • Experience managing compute infrastructure, including virtual machines, cloud instances, and physical servers.

  • Familiarity with system services and web servers such as Nginx, IIS, and systemd.

  • Understanding of capacity planning, performance tuning, and infrastructure optimization practices.

  • Strong troubleshooting and analytical skills with the ability to resolve complex operational issues.

  • Excellent communication, documentation, and cross-functional collaboration skills.

  • Experience working in high-availability, mission-critical production environments is highly preferred.



Are you ready to shape the future of payments in MENA?

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified