Firmware Automation Engineer

 Posted 2 hours ago
     
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Design and maintain automated firmware update pipelines for a global fleet of servers, GPUs, and network fabrics. Build safety nets including health checks and automated rollbacks to ensure cloud stability during hardware updates.

Imagine a future where everyone has instant, low-cost access to intelligence. We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI models. In addition, our GPUs run on 100% renewable energy.

We’re ambitious, curious, and gutsy doers. We practice a low hierarchy across the company and high morale in our teams. We’ve already achieved a lot, yet we’re only getting started. Now it’s your chance to join the ride. We offer more than just the job - we offer a career-defining opportunity to be part of building something big!

Join Verda while it’s still being built - not once it’s finished.

About the role

Instead of traditional systems engineering, your focus will be entirely on the silicon and low-level software layers. You will design, build, and maintain the automation frameworks that keep our entire fleet up to date—covering everything from motherboards and Baseboard Management Controllers (BMCs) to high-performance InfiniBand fabrics and GPU clusters. In our "move fast" culture, your job is to build the safety nets (health checks, canary deploys, and automated rollbacks) that allow us to update hardware rapidly without taking down the cloud.

Your Responsibilities

  • Fleet-Wide Firmware Orchestration: Design and execute automated, zero-touch firmware update pipelines across our global fleet of servers, switches, and accelerators.

  • Heterogeneous Hardware Management: Own the firmware lifecycle for multiple hardware profiles, including Systemboards (BIOS/UEFI), BMCs, InfiniBand HCAs, Network Switches, and high-performance GPUs.

  • Redfish & API Automation: Leverage Redfish and IPMI APIs to programmatically discover, configure, inventory, and update bare-metal assets at scale.

  • Build Blast-Radius Protections: Architect robust telemetry pipelines to assess hardware health post-flash, and build automated rollback mechanisms to instantly recover from faulty vendor payloads.

  • Tooling & Infrastructure as Code: Integrate firmware management pipelines cleanly into our existing configuration management ecosystems (SaltStack/Ansible) and provisioning workflows.

  • Vendor & Engineering Liaison: Deep-dive into hardware errata and bug reports, collaborating closely with hardware vendors and internal platform teams to patch critical CVEs and performance regressions.

Your key competencies

  • Automation at Scale: Proven experience managing and updating firmware across thousands of nodes simultaneously. You don't do "one-off" flashes; you write code to update clusters.

  • Deep Redfish Expertise: Mastery of the DMTF Redfish specification for server management, out-of-band communication, and telemetry collection.

  • Hardware Polyglot: Strong understanding of the architectural differences in managing firmware across systemboards, network stacks (InfiniBand/Ethernet), and GPU architectures.

  • Failure Domain Expertise: A paranoid mindset regarding hardware health. You know how to verify if a device is truly healthy post-boot and how to safely execute an unattended rollback if it isn't.

  • Scripting & Infrastructure Code: Proficiency in Python, Go, or advanced shell scripting alongside configuration management tools (Ansible/Salt) to interact with hardware APIs.

Nice-to-Haves

  • Experience working within the OpenBMC ecosystem or building custom BMC firmware images.

  • Deep operational familiarity with NVIDIA/Mellanox firmware management tools (mstflint, mlxup).

  • Experience managing firmware lifecycles specifically within large-scale AI/ML training clusters or high-performance computing (HPC) environments.

Why Verda

  • Cash + equity compensation along with various fringe benefits (e.g., healthcare, lunch, wellbeing, etc.).

  • Profitable operations with rapid, sustained growth.

  • 31 nationalities, with 6 different ones on the management team.

  • An opportunity to make a clear impact and work alongside world-class engineers, researchers, and partners across the global AI ecosystem.

Practicalities

  • Work mode: Remote (EU)

  • Employment type: Full-time, permanent

  • Start date: As soon as possible

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified