Network Reliability Engineer

 Posted 7 hours ago
  
 Poland
  
 200 - 250 per hour
  
2-5 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Build and maintain large-scale AI infrastructure focusing on monitoring, diagnosis, and remediation of production incidents. Collaborate with engineering teams to ensure service continuity and implement observability solutions for infrastructure health.

 

#HPC #AI #GPU #CLUSTERS

 

YOUR DAILY ROUTINE

- Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents- Troubleshoot high-impact production issues in collaboration with other engineering teams

- Participate in an on-call rotation to handle incidents and ensure service continuity

- Implement and maintain observability solutions to monitor AI infrastructure and application health

- Contribute to AI infrastructure lifecycle management across different environments and countries

- Promote and apply best practices in terms of stability, resiliency, scalability, and security

- Maintain clear technical documentation for tools and procedures

- Contribute to system and tool evolution based on production feedback

- Collaborate closely with development teams to ensure infrastructure readiness- Participate in team rituals and knowledge-sharing initiatives

 

ABOUT YOU

 

🎯 SOFTSKILLS : 

- Proactive and solution-oriented mindset

- Passion for automation and continuous improvement

- Strong collaboration and communication skills

- Ability to work independently and in a team

- Willingness to mentor and share knowledge

 

💻 HARDSKILLS : 

- Experience with Go or Python 

- Strong scripting skills (Bash, Python)

- Hands-on experience with Linux systems (Ubuntu/Debian)

- Preferred hands-on experience with GPU & HPC infrastructure 

- Knowledge of networking (TCP/IP, DNS, BGP, load-balancing, IPv6, etc.)

- Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic, etc.)

- Comfortable with Infrastructure-as-Code (Ansible, Salt, AWX, etc.)

- Experience managing relational databases (MariaDB)

- Understanding of CI/CD pipelines (GitLab)

- Comfortable with English (written and spoken)

 

\n


\n
200 zł - 250 zł an hour
\n

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified