NOC Engineer / NOC Lead

 Posted 2 hours ago
     
2-5 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

The NOC Engineer manages 24/7 monitoring and first-response for GPUaaS infrastructure to protect customer SLAs. Responsibilities include triaging alerts, executing runbooks, and coordinating with on-call specialists during incidents.

NOC Engineer / NOC Lead

Infrastructure operations · shared across customers

Reports to: Manager, NOC (or Director, Service Operations)

Location: Remote (US) with assigned shift; rotating coverage

Department: Infrastructure & DC Operations / Network Engineering

Position summary

The NOC Engineer operates STN's 24/7 monitoring and first-response capability for GPU One (GPUaaS) infrastructure. The role triages alerts, executes documented runbooks, and coordinates with on-call specialists during incidents to protect customer SLAs.

Key responsibilities

  • Monitor infrastructure alerts, customer SLA dashboards, and system health on a 24/7 basis

  • Triage incidents and engage on-call SREs, Network, Hardware, or Field Engineering as needed

  • Execute documented runbooks for common platform, network, and hardware issues

  • Manage the incident lifecycle including initial customer notification and status updates

  • Coordinate planned maintenance windows and change windows with internal teams and customers

  • Update status pages and customer-facing communications during incidents

  • Maintain shift handoff documentation and active-incident logs

  • Support ticket queue handling including Tier 1 ticket resolution

  • Contribute to continuous improvement of monitoring coverage, alert quality, and runbooks

  • Work rotating shifts including nights, weekends, and holidays

Required qualifications

  • 3+ years in a NOC, SOC, or IT operations function

  • Hands-on experience with monitoring tools (Datadog, Prometheus, Grafana, PagerDuty, or equivalent)

  • Strong Linux and basic networking fundamentals

  • Excellent written and verbal communication, particularly under pressure

  • Willingness and ability to work rotating shifts including overnight coverage

Preferred qualifications

  • GPU, HPC, or large-scale cloud infrastructure background

  • ITIL Foundations certification

  • Demonstrated on-call and major-incident response experience

  • Scripting skills (Python, Bash) for runbook automation

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified