Systems Test & Analysis Engineer (Storage/Distributed Systems)

 Posted 2 months ago
     
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

You will investigate system failures, verify test validity, and analyze complex logs to understand system behavior. This role bridges development and support to establish feedback loops that improve system reliability.

Your mission

About the Role:
We are seeking a highly curious and tenacious Investigator to join our team. While we have "builders" focused on creating tests, we lack a dedicated individual to examine failures, verify test validity, and deeply understand why they fail. This role is crucial for establishing amazing feedback loops that bridge development and support, ultimately enabling us to ship without fear.
If you thrive on digging through data, embrace complex debugging, and enjoy finding the critical clue in 20,000 lines of logs, this is the role for you. While we do hope for a candidate who possesses all the relevant skills for this position, we are also pragmatic: demonstrating a growth mindset and a willingness to learn and ramp-up knowledge is ultimately more valuable.


Your profile

The Mindset We Value:

  • Investigator: Driven by intense curiosity ("Why did it fail?").

  • Empirical: Don't guess – find a way to get the data.

  • Observability Minded: Committed to using data and metrics to understand the system's behavior.

  • Patience: The fortitude required to tackle complex, deep-seated issues.

  • Pragmatic: Delivers results that move the needle.


Required Skills & Experience:

  • Coding & Scripting: Strong proficiency in Python, bash, and potentially Go, for automation (including experience with tools like teuthology).

  • Debugging & Tracing: Expertise with system internals (strace, lsof, /proc), tracing technologies (eBPF), and general debugging (gdb).

  • System & Resource Analysis: Ability to analyze system resources, including load, iowait, and zombie processes.

  • Storage & Networking: Deep understanding and debugging experience with storage concepts (S3, HTTP errors, POSIX, ACLs, flock, etc.) and networking (tcpdump, wireshark).

  • Domain Expertise: Familiarity with tools for breaking storage (fio, fsx, elbencho, xfs test suite) and understanding of distributed systems principles (experience with tools like Jepsen or Antithesis is a plus).

  • Code Reading: Reading ability in C/C++


Why us?

What We Provide You:

  • Autonomy and Ownership: We encourage you to work independently and take ownership of your tasks, while providing a supportive team environment for collaboration and guidance.
  • Mentorship and Growth: We recognize that mastering complex topics requires time, focused effort, and expert guidance. Your colleagues will actively support you to ensure your success and continuous development.
  • Owner-managed company: short decision paths and high adaptability
  • Opportunity to grow in a dynamic, international technology environment
  • Flexible working hours
  • Collaborative environment with close interaction across teams


Kontaktinformationen

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified