Senior Cloud Network Operations Engineer

 Published 21 days ago
 United States
Apply Now Please mention DailyRemote when applying

Disclaimer: Before you apply, please make sure the job is legit.

Attempting to apply for jobs might take you off this site to a different website not owned by us. Any consequence as a result for attempting to apply for jobs is strictly at your own risk and we assume no liability.


While candidates in the listed location(s) are encouraged for this role, candidates in other locations will be considered. This role is hybrid.

We're growing fast and attracting the best talent in the world. Bricksters — as we call ourselves — are a special mix of smart, curious, quick thinkers. If you ask a Brickster what they love about working here, you'll likely hear about our culture.

We are seeking an experienced a Network Operations Center engineer to join our team. The successful candidate will be responsible for monitoring critical Databricks' infrastructure and developing monitoring tools and alerting dashboards. They will also work closely with stakeholders to investigate and resolve incidents, perform root cause analysis, and propose solutions to increase the reliability and stability of the Databricks platform.

The impact you will have:

  • Monitor critical infrastructure, triage alerts to proactively identify incidents, and work with stakeholders to resolve incidents.
  • Investigate incidents and propose solutions to improve platform reliability and stability.
  • Perform root cause analysis for reoccurring incidents and provide proactive solutions.
  • Develop toolings or automate processes to improve platform monitoring and alerting.
  • Contribute to software development efforts to improve overall service reliability and stability.
  • Communicate with internal stakeholders, including executive staff, to provide incident analysis.
  • Participate in war rooms and temporary communication channels during outages.
  • Demonstrate cross-functional leadership and establish ownership of incidents and outages.
  • Multitask on several incidents and/or projects at once

What we look for:

  • 3 years of experience as a NOC, SRE, or DevOps engineer
  • Knowledge of cloud technologies such as Azure, AWS, and GCP
  • Hands-on experience with monitoring, logging, and alerting tools
  • Hands-on experience with containers and orchestration technologies
  • Automation and scripting skills
  • Linux systems administration skills.
  • Knowledge of managing incidents
  • Excellent communication skills.
  • Technical degree or equivalent experience
  • Willingness to learn the Databricks products

Ace Your Job Interview

Read our advice on how to answer the most common interview questions.