Design, build, and automate large-scale production OpenStack environments for enterprise private clouds and MSP platforms. Manage high-availability control planes, compute virtualization, and complex SDN and storage integrations.
This is a remote position.
About the Role
We are seeking a deeply technical Senior OpenStack Engineer to design, build, automate, scale, and operate large-scale production OpenStack environments powering enterprise private clouds, MSP platforms, and high-performance digital twin lab infrastructures.
This is not a UI-driven admin role. We are looking for engineers who understand OpenStack at the service, database, messaging, hypervisor, and packet-flow layers — individuals who can troubleshoot RabbitMQ queues, debug Neutron agents, tune Ceph latency, and automate full cloud deployments from bare metal upward.
You will work on multi-region architectures, high-availability designs, NVMe storage fabrics, SDN integrations, and hybrid cloud platforms supporting global customers.
Primary Responsibilities
1. OpenStack Architecture & Platform Engineering
- Design production-grade OpenStack environments across controller, compute, and storage nodes.
- Architect HA control planes using HAProxy, Keepalived, Galera, and RabbitMQ clustering.
- Build scalable cell-based Nova architectures.
- Implement multi-region replication strategies.
- Perform platform capacity modeling and growth forecasting.
2. Compute Virtualization (Nova)
- Nova scheduler tuning and filters.
- CPU pinning and isolation.
- NUMA topology alignment.
- HugePages configuration.
- Live migrations and evacuations.
- GPU passthrough and SR-IOV provisioning.
Hypervisor stack includes KVM, QEMU, Libvirt, and VirtIO.
3. Networking & SDN (Neutron)
- ML2 plugin architecture.
- OVS, OVN, Linux Bridge deployments.
- VXLAN, Geneve, VLAN overlays.
- DVR and L3 routing.
- Floating IP NAT design.
- SR-IOV and DPDK acceleration.
- Integration with BGP EVPN, MPLS, VRFs, and SD-WAN.
4. Storage Engineering
Ceph (Primary Requirement)
- RBD block storage.
- CephFS and RGW object storage.
- CRUSH map tuning.
- Placement group optimization.
- BlueStore performance tuning.
- NVMe and SSD tiering.
Additional exposure to Linstor, DRBD, iSCSI, and NVMe-oF preferred.
5. Image & Lifecycle Services
- Glance image pipelines.
- QCOW2 optimization.
- Cloud-init automation.
- Golden image lifecycle management.
6. Identity & Access (Keystone)
- RBAC modeling.
- LDAP/AD integration.
- SAML/SSO federation.
- Token lifecycle management.
7. Orchestration & Automation
- Heat orchestration templates.
- Terraform automation.
- Ansible playbooks.