Please mention DailyRemote when applying
Imagine a future where everyone has instant, low-cost access to intelligence. We’re building a fully featured European AI cloud - with everything one needs to train, experiment with, and deploy AI models. In addition, our GPUs run on 100% renewable energy.
We’re ambitious, curious, and gutsy doers. We practice a low hierarchy across the company and high morale in our teams. We’ve already achieved a lot, yet we’re only getting started. Now it’s your chance to join the ride. We offer more than just the job - we offer a career-defining opportunity to be part of building something big!
Join Verda while it’s still being built - not once it’s finished.
Instead of traditional systems engineering, your focus will be entirely on the silicon and low-level software layers. You will design, build, and maintain the automation frameworks that keep our entire fleet up to date—covering everything from motherboards and Baseboard Management Controllers (BMCs) to high-performance InfiniBand fabrics and GPU clusters. In our "move fast" culture, your job is to build the safety nets (health checks, canary deploys, and automated rollbacks) that allow us to update hardware rapidly without taking down the cloud.
Fleet-Wide Firmware Orchestration: Design and execute automated, zero-touch firmware update pipelines across our global fleet of servers, switches, and accelerators.
Heterogeneous Hardware Management: Own the firmware lifecycle for multiple hardware profiles, including Systemboards (BIOS/UEFI), BMCs, InfiniBand HCAs, Network Switches, and high-performance GPUs.
Redfish & API Automation: Leverage Redfish and IPMI APIs to programmatically discover, configure, inventory, and update bare-metal assets at scale.
Build Blast-Radius Protections: Architect robust telemetry pipelines to assess hardware health post-flash, and build automated rollback mechanisms to instantly recover from faulty vendor payloads.
Tooling & Infrastructure as Code: Integrate firmware management pipelines cleanly into our existing configuration management ecosystems (SaltStack/Ansible) and provisioning workflows.
Vendor & Engineering Liaison: Deep-dive into hardware errata and bug reports, collaborating closely with hardware vendors and internal platform teams to patch critical CVEs and performance regressions.
Automation at Scale: Proven experience managing and updating firmware across thousands of nodes simultaneously. You don't do "one-off" flashes; you write code to update clusters.
Deep Redfish Expertise: Mastery of the DMTF Redfish specification for server management, out-of-band communication, and telemetry collection.
Hardware Polyglot: Strong understanding of the architectural differences in managing firmware across systemboards, network stacks (InfiniBand/Ethernet), and GPU architectures.
Failure Domain Expertise: A paranoid mindset regarding hardware health. You know how to verify if a device is truly healthy post-boot and how to safely execute an unattended rollback if it isn't.
Scripting & Infrastructure Code: Proficiency in Python, Go, or advanced shell scripting alongside configuration management tools (Ansible/Salt) to interact with hardware APIs.
Experience working within the OpenBMC ecosystem or building custom BMC firmware images.
Deep operational familiarity with NVIDIA/Mellanox firmware management tools (mstflint, mlxup).
Experience managing firmware lifecycles specifically within large-scale AI/ML training clusters or high-performance computing (HPC) environments.
Cash + equity compensation along with various fringe benefits (e.g., healthcare, lunch, wellbeing, etc.).
Profitable operations with rapid, sustained growth.
31 nationalities, with 6 different ones on the management team.
An opportunity to make a clear impact and work alongside world-class engineers, researchers, and partners across the global AI ecosystem.
Work mode: Remote (EU)
Employment type: Full-time, permanent
Start date: As soon as possible
Stop the endless job search. Our AI finds and applies to the best jobs for you.
Discover remote opportunities in Software Development
Answer easy questions
200,000+ jobs across 15+ categories
Get your best job matches
Only hand-screened, legit jobs
Find a remote job faster
No ads, scams, or junk
“ I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!