The role involves creating technical content, documentation, and sample apps to help developers use the product. It also requires engaging with the developer community and relaying feedback to internal product and engineering teams.
Nebius
131 Remote Job Openings at Nebius
Build and test LLM-based solutions and applications using Token Factory's inference services. Assist with prompt engineering, benchmarking, and inference optimization while contributing to internal automation tooling.
Own the end-to-end product strategy, roadmap, and delivery for a specific slice of the AI Compute Platform. Drive cross-functional execution across engineering and SRE teams to deliver hyperscaler-quality APIs and infrastructure outcomes.
Build and scale core services for a unified observability ecosystem covering logs, metrics, traces, and alerting. Focus on high-volume telemetry ingestion, distributed storage, and AI-assisted troubleshooting to support a global AI cloud platform.
Lead global partner data, planning, and insights to scale partner-led growth through reports, dashboards, and AI-enabled workflows. Support GTM execution across various partner types including hyperscalers, ISVs, and GSIs while collaborating with cross-functional leadership.
Serve as a technical advisor helping customers design, deploy, and scale AI solutions and large-scale GPU workloads. Act as a bridge between customers and product teams to resolve complex AI/ML issues and drive product growth.
Own the full sales cycle from prospecting to closing for commercial and mid-market accounts. Partner with technical and product teams to drive evaluations and refine the commercial sales motion.
Design, build, and maintain automation workflows and integrations across IT systems and SaaS platforms. Automate operational tasks such as user management and data synchronization while ensuring secure implementation and documentation.
Drive activation, conversion, and product-led growth by optimizing onboarding flows and free-to-paid conversion rates. Design and execute A/B tests and lifecycle campaigns while owning the experimentation roadmap and growth reporting.
Establish and lead the customer success function for EMEA, guiding enterprise customers through onboarding, implementation, and long-term adoption of the AI search platform. Act as a strategic partner to executive sponsors and a bridge between customers and internal product teams to influence the roadmap.
Define and build systems to track the marketing journey for Token Factory, from signups to revenue. Collaborate across Marketing, Product, and Sales to create automation and intelligence layers for GTM execution.
Define the positioning, messaging, and go-to-market strategy for AI inference infrastructure and the Token Factory platform. Collaborate with Product and Engineering teams to create compelling narratives and sales enablement tools that drive adoption among enterprise and AI-native customers.
Design and implement embedded firmware for server management, telemetry, and control systems for GPU and HPC platforms. Maintain custom OpenBMC firmware and collaborate with hardware engineers to validate and optimize low-level drivers.
Design, build, and maintain scalable data pipelines and infrastructure to power analytics and business intelligence. Optimize data storage, processing, and query performance for large-scale datasets while ensuring data governance and security.
The role involves building and maintaining ASPM tools to identify and remediate application security vulnerabilities. You will collaborate with development teams to integrate security best practices into the SDLC and conduct penetration testing.
Manage end-to-end transport of IT hardware and data center equipment across Europe, Taiwan, and Israel. Coordinate inbound/outbound freight, handle customs documentation, and manage vendor relationships to ensure timely delivery.
Identify and secure optimal locations for data center developments and colocation expansions through market analysis and technical due diligence. Manage the full colocation lifecycle, including vendor RFPs, contract negotiations, and performance monitoring against SLAs.
Execute day-to-day due diligence assessments and risk scoring for third parties to improve the TPRM framework. Coordinate screenings for sanctions and PEPs while maintaining the Third-Party Risk Register and escalating red flags.
Develop the control plane and lifecycle automation for Managed PostgreSQL while tuning database internals for AI workloads. Build migration tooling and drive the integration of vector search capabilities within the Nebius AI Cloud stack.
Maintain and grow DevTools systems, including GitLab and TeamCity, to support large-scale AI cloud infrastructure. Focus on building fault-tolerant architectures and improving system performance based on user feedback.
Manage end-to-end transport of IT hardware, including servers and racks, across domestic US and international lanes. Coordinate with vendors and data center teams to ensure timely delivery, compliance with import/export regulations, and efficient RMA processing.
Own the commercial execution and day-to-day management of strategic relationships within the Media & Entertainment vertical. Drive partner ecosystem growth, develop commercial frameworks, and translate AI infrastructure capabilities into partner-ready narratives.
Lead the technical product strategy and GTM motion for Retail & Commerce AI, serving as the principal technical partner for strategic lighthouse accounts. Translate customer needs into scalable platform requirements and architect solution patterns for the broader market.
Lead the long-term architecture strategy for Go-To-Market systems, focusing on CRM and adjacent business platforms. Drive AI-enabled transformation and enterprise integration across Sales, Marketing, and Customer Success operations.
Develop and manage strategic partnerships with US venture capital firms and accelerators to grow the AI startup pipeline. Drive adoption of the startup program by sourcing high-potential AI startups and executing value-added GTM initiatives.
Lead engagement and growth with strategic Managed Service Provider (MSP) and Value-Added Reseller (VAR) partners. Develop partner enablement programs and collaborate with leadership to align channel activities with broader sales strategies.
Design and implement cloud infrastructure and MLOps solutions for clients, acting as a trusted technical advisor. Conduct PoCs, workshops, and optimize pipeline performance to ensure efficient utilization of Nebius AI resources.
Design and implement cloud infrastructure and MLOps solutions for clients, acting as a trusted technical advisor. Conduct PoCs, workshops, and optimize pipeline performance to ensure efficient utilization of GPU cloud resources.
Design and implement cloud infrastructure and MLOps solutions for clients, acting as a trusted technical advisor. Responsibilities include conducting PoCs, optimizing pipeline performance, and collaborating with product and marketing teams.
Build and scale Nebius' ISV and AI partner ecosystem from the ground up to drive net-new customer acquisition and revenue. Develop repeatable co-sell motions and lead joint go-to-market execution with strategic partners.
The Director of Product, Ecosystem is responsible for mapping and engaging key AI companies to build a robust partner ecosystem across various platform layers. This includes prototyping technical integrations and translating external market signals into product capabilities and strategic investment decisions.
Manage IT procurement requests and invoice processing within ERP systems to ensure timely payments and accurate documentation. Collaborate with Finance, Legal, and external vendors to streamline operational workflows and support vendor onboarding.
Design and develop internal web applications to automate data center and GPU cloud infrastructure operations. Collaborate with infrastructure and platform teams to translate operational requirements into scalable frontend solutions.
Drive go-to-market execution and manage strategic client relationships within the Digital Health, Medical Devices, and Medical Imaging segments across EMEA. Collaborate with Account Executives to qualify opportunities and position AI/cloud solutions to meet healthcare regulatory and business needs.
Build and scale Nebius' ISV and AI partner ecosystem from the ground up to drive net-new customer acquisition and revenue. Develop repeatable co-sell motions and lead joint go-to-market execution to increase partner-sourced pipeline.
Lead the diagnosis and resolution of advanced technical issues across Linux, networking, Kubernetes, and cloud environments for Nebius clients. Provide technical leadership, mentor junior staff, and collaborate with engineering teams to improve system scalability and operational efficiency.
Lead market-informed pricing and packaging strategies to define customer willingness to pay and value-based models. Implement these strategies across billing systems and partner with Sales and Business Strategy to ensure executable pricing models.
Lead the evolution of internal operational products and workflows for the Customer Experience organization to improve scalability and reduce manual overhead. Own the end-to-end product lifecycle from feature intake and solution design to delivery and impact measurement.
Investigate and resolve complex technical issues involving Linux, Kubernetes, and GPU-based AI workloads in customer environments. Act as a senior escalation point for production incidents and collaborate with engineering to develop long-term fixes and automation tools.
Architect and implement AI infrastructure solutions for strategic Media & Entertainment partners, translating complex business needs into engineered technical blueprints. Influence the global M&E product roadmap by collaborating with engineering leadership and validating integrations with critical ISVs.
Plan, design, and oversee the implementation of electrical engineering systems for AI data centers. Collaborate with internal technical teams and external vendors to ensure quality benchmarks are met on time and within budget.
Plan and implement cost-effective mechanical designs and distribution systems for AI data centers. Collaborate with internal partners and external vendors to validate installation and operational performance of mechanical systems.
Lead and support the benchmarking of GPU platforms to evaluate performance for machine learning and AI workloads. This includes profiling GPU performance at the system and kernel level and optimizing ML workloads to resolve bottlenecks.
Serve as the primary technical advisor for strategic GPU Cloud customers to design and scale AI solutions. Collaborate with sales and product teams to optimize GPU performance and align product features with customer requirements.
Serve as a technical advisor helping clients design, deploy, and scale AI solutions and manage large-scale GPU workloads. Collaborate with sales and product teams to resolve complex AI/ML issues and align product features with customer requirements.
Design and prototype integrations between partner products and the Nebius platform to define scalable reference architectures. Translate external integration findings into actionable product requirements to shape the platform roadmap.
Serve as a technical advisor helping strategic customers design, deploy, and scale large-scale GPU workloads for AI solutions. Collaborate with sales and product teams to drive growth and relay customer feedback for product enhancement.
Lead the design, implementation, and optimization of global enterprise wired and wireless networks to ensure scalability and security. Manage remote access systems and drive automation-first operations for network monitoring and connectivity.
You will co-own the Serverless AI product roadmap, defining technical requirements and making strategic build-versus-buy decisions for infrastructure capabilities. Additionally, you will drive customer adoption through technical content, market analysis, and direct engagement with ML engineers.
The Event Manager will plan and execute regional marketing events, including conferences, workshops, and partner activations. They will manage logistics, vendor coordination, and cross-functional collaboration to ensure events align with brand goals and deliver a high-quality attendee experience.
The Event Manager will support the end-to-end execution of regional marketing events, including logistics, vendor coordination, and on-site delivery. They will also collaborate with cross-functional teams to manage event data, track metrics, and ensure alignment with business goals.
The OFCI Procurement Manager is responsible for the technical and commercial procurement of mission-critical data center equipment. This includes managing vendor selection, contract negotiations, and overseeing the delivery and installation readiness of long-lead power and cooling systems.
Design and implement LLM-based solutions using the Nebius Token Factory inference platform to drive business value. Collaborate with product and engineering teams to shape the platform roadmap and guide customers from POC to production.
The Technical Account Manager will lead the transition of customer AI workloads from proof-of-concept to stable, scalable production environments. They will act as a trusted technical partner to optimize performance, cost-efficiency, and reliability while managing incidents and providing feedback to internal product teams.
You will build technical demos, reference architectures, and open-source examples to showcase Nebius AI Cloud capabilities. Additionally, you will create high-signal technical content and represent the company within the AI developer community.
You will tune and troubleshoot GPU clusters and InfiniBand networks to ensure optimal performance in high-performance computing environments. Additionally, you will integrate new hardware into the infrastructure and enhance automation systems for proactive monitoring and issue resolution.
You will analyze and optimize the performance of large-scale GPU clusters by identifying bottlenecks across hardware and software layers. Additionally, you will support performance-related escalations and contribute to hardware qualification and cluster validation.
The Technical Project Manager will own the delivery of defined workstreams within large-scale data center and GPU infrastructure programs. Responsibilities include tracking progress, managing dependencies, coordinating with cross-functional teams, and ensuring quality delivery while escalating risks.
The Technical Account Manager will lead the transition of customer AI workloads from proof-of-concept to production, ensuring stability, scalability, and cost-efficiency. They will act as a primary technical partner to resolve bottlenecks, manage incidents, and provide actionable feedback to internal product teams.
The Lead Systems HPC Engineer will analyze and optimize the performance of large-scale GPU clusters by identifying bottlenecks across hardware and software layers. They will also collaborate with infrastructure and vendor teams to integrate new hardware and support complex performance-related escalations.
You will own the product roadmap and delivery for Token Factory feature streams like Evaluation, Agents, or AIOps. Additionally, you will partner with engineering and research to design high-performance tools and drive product-market fit through customer engagement.
You will drive the adoption of the Token Factory platform by creating demos, workshops, and technical guides for AI developers. Additionally, you will act as a bridge between the developer community and the internal product and engineering teams to gather feedback and improve the platform.
Drive planning and execution across multiple engineering and product teams to launch new external-facing cloud services. Coordinate service releases, manage stakeholder expectations, and align requirements between Sales, Partnerships, Legal, and Engineering.
You will initiate and drive cross-team projects while aligning conflicting priorities and stakeholder needs across engineering domains. Additionally, you will facilitate technical decision-making and establish scalable processes to support core cloud infrastructure.
You will manage large-scale projects for the Nebius Cloud Platform, coordinating across technical and business teams to scale infrastructure and support new hardware. This involves initiating cross-team projects, setting goals, facilitating decision-making, and aligning stakeholder needs up to the C-level.
The Enterprise Applications Engineer will manage, configure, and maintain the company's collaboration SaaS ecosystem, including Atlassian tools, Slack, and Zoom. They will also drive automation, ensure platform reliability, and partner with security teams to enforce governance and access controls.
You will be responsible for the administration, configuration, and continuous improvement of the Atlassian Cloud ecosystem, including Jira and Confluence. Additionally, you will collaborate with cross-functional teams to ensure business applications remain secure, efficient, and aligned with organizational standards.
You will be responsible for scaling distributed backend systems and optimizing AI infrastructure to ensure high performance and reliability. The role involves collaborating with cross-functional teams to build and maintain enterprise-grade AI integration platforms.
You will drive the adoption of AI and cloud solutions across the DACH marketplace by managing strategic client relationships and leading the end-to-end sales process. This involves developing territory plans, negotiating contracts with C-suite executives, and orchestrating deal teams to deliver value-based proposals.
You will act as a trusted advisor to design scalable AI solutions and resolve technical challenges for customers. Additionally, you will manage large-scale AI deployments and collaborate with engineering teams to relay customer feedback.
Define and execute the go-to-market strategy for the digital business segment while building and leading a high-performing team of Enterprise Account Executives. Drive multi-year revenue growth and strategic infrastructure partnerships by evangelizing GPU-based AI platforms to large digital businesses.
Define and execute the strategic sales motion for Frontier AI Labs and top global enterprise accounts while leading a team of Strategic Account Executives. Personally engage in high-level infrastructure negotiations and build multi-threaded account strategies to drive revenue and long-term partnerships.
The RVP will build and lead a verticalized enterprise sales team to drive multi-year AI infrastructure deals and operationalize a partner-led go-to-market model. They are responsible for defining GTM strategies, managing revenue targets, and fostering executive-level relationships across key industry verticals.
The Ethics & Compliance Specialist will support the development and execution of global compliance programs, including conflict of interest, anti-bribery, and third-party risk management. They will also collaborate with cross-functional teams to improve processes, conduct training, and prepare compliance reporting for management.
The lead will drive applied research across retrieval, ranking, and agent-centric search systems, focusing on designing and improving multi-stage retrieval pipelines and developing grounding approaches for LLMs using real-time web data. Responsibilities also include defining evaluation methodologies, leading experimentation on modern retrieval techniques, and working closely with engineering to deploy research into production at scale.
You will design, train, and deploy machine learning models for retrieval, reranking, and search relevance in production. This role involves working on systems operating at large scale and collaborating closely with engineering teams.
The Senior Network Engineer will be responsible for designing, building, and operating large-scale, high-performance data center networks supporting GPU-dense AI workloads, taking end-to-end ownership of service provider–grade and CLOS-based network infrastructure. Responsibilities include designing scalable architectures, owning deployment and lifecycle management of routing/switching infrastructure, and optimizing traffic engineering strategies.
The role involves building and deepening relationships with top SIs, MSPs, and VARs in the DACH region, ensuring alignment with strategic goals. Responsibilities also include designing and executing partner enablement programs and collaborating with leadership on strategic priorities and growth opportunities.
The Senior Technical Program Manager will drive operational performance across data center IT and infrastructure teams by defining success metrics, implementing tracking frameworks, and leading operational reviews to identify bottlenecks. Responsibilities include defining and improving KPIs, ensuring structured reporting for accountability, and standardizing processes across sites for scalability.
The engineer will design, deploy, and operate large-scale DWDM-based optical transport networks across long-haul and metro environments, handling wavelength provisioning and optical power tuning. Responsibilities also include troubleshooting cross-layer issues spanning optical transport and IP/MPLS networks and developing Python automation for operational workflows.
The Sales Operations Specialist supports the end-to-end deal operations process, focusing on accurately translating sales agreements into contracts, billing configurations, and internal systems while ensuring pricing compliance. This role involves handling routine and non-standard operational cases and coordinating closely with Sales, Support, and Finance teams.
Responsibilities include assisting in designing and running experiments to train LLMs, analyzing results, and writing/maintaining code for data processing and model evaluation. The role also requires collaboration on new reinforcement learning methods for agents.
The role involves spending roughly half the time in the field assisting new customers with POCs and technical onboarding, and the other half building prototypes, exploring emerging AI techniques, and translating field insights into product direction. Responsibilities include building demos across the portfolio, supporting customer validation, and feeding grounded feedback into the product roadmap.
The Senior ML Solutions Architect will design and implement LLM-based solutions utilizing Nebius Token Factory's inference services to achieve business value and support customer objectives. Responsibilities also include building production-ready applications with multimodal and domain-specific LLM APIs and collaborating with engineering teams to refine the platform based on client needs.
The manager will own the development and execution of the startup community engagement strategy across key markets, focusing on cultivating relationships with developers, accelerators, and founders. Responsibilities include designing scalable engagement programs, leading high-impact events, driving startup pipeline conversion, and developing enablement resources.
This role involves leading deep technical discovery with engineering teams to understand complex AI workload requirements, translating customer ambitions into production-feasible, scalable, and economically viable architectures. The engineer will partner closely with Sales to influence deal strategy, prevent misaligned commitments, and ensure high PoC-to-production conversion rates.
The Regional GTM Recruiting Manager will own the Go To Market recruiting strategy and execution across EMEA and APJ, leading and developing the regional recruiting team while partnering with commercial leadership to align hiring with revenue targets. Responsibilities include structuring workforce planning, supporting expansion into new regions, building scalable processes, and ensuring high-quality candidate experiences.
This role involves leading deep technical discovery with engineering teams to understand complex AI workload requirements, translating customer ambitions into scalable, production-feasible architectures, and identifying technical risks early. The Senior Sales Engineer will partner closely with Sales to influence deal strategy, ensure technical realism, and drive PoC-to-production conversion by defining measurable success criteria.
This role is responsible for owning and growing executive-level relationships with top SIs, MSPs, and VARs across APJ, developing joint business plans to drive partner-sourced and influenced pipeline and revenue. The manager will also build scalable partner enablement and align partner motions with the broader Go-To-Market strategy.
The Field CISO will serve as the executive security voice for the US market, acting as a trusted advisor to customer CISOs to accelerate strategic enterprise deals. This role involves influencing the security product roadmap, aligning security posture with US risk frameworks, and leading executive-level security discussions.
The Senior Delivery Deployment Engineer will own the end-to-end delivery, deployment, and production readiness of next-generation GPU platforms inside data centers, leading on-site rack bring-up and validating NVIDIA-based AI systems. Responsibilities include overseeing installation, troubleshooting complex hardware/OS/network issues, executing validation procedures, and coordinating on-site hardware repairs.
The role involves adapting YDB to fully utilize modern hardware like QLC NVMe drives, DPUs, and maximizing performance on standard devices like HDDs and TLC NVMe. Responsibilities also include reengineering YDB components using more efficient algorithms to address complex system challenges.
The Principal Solutions Architect will own end-to-end technical presales and solution delivery for strategic customers, acting as a trusted technical advisor to senior stakeholders on complex ML/AI infrastructure and MLOps architectures. Responsibilities also include driving advanced Proofs of Concept (PoCs), influencing the product roadmap, and mentoring other Solutions Architects.
The Network Software Engineer will build and maintain services and tooling that automate the network lifecycle and make network changes safe and transparent. They will also develop observability systems and collaborate closely with network engineers and SREs to create product-quality tooling.
The role involves serving as the technical interface between Nebius and strategic technology partners, designing and developing integrated solutions, building reference implementations, and enabling partner engineering teams. Responsibilities include owning the technical relationship with partners, architecting high-impact integration solutions, and translating feedback into product roadmap requirements.
The role involves building a distributed, fault-tolerant storage system for the hyperscaler cloud, focusing on high-performance block and filesystem storage capable of sub-millisecond latency and high throughput despite hardware failures. Key work areas include virtualization technologies like virtio-blk and QEMU, high-throughput networking over TCP and RoCEv2, and implementing storage system features like replication and self-healing.
The role involves creating diverse visual content for social media, blog posts, and advertising creatives, including illustrations, diagrams, and motion graphics. Responsibilities also include maintaining visual guidelines and collaborating closely with cross-functional marketing and product teams.
Enhancing fine-tuning methodologies for cutting-edge LLMs and researching advanced inference optimization techniques. Re-implementing state-of-the-art open-source LLM architectures in JAX.
The Venture Ecosystem Lead will develop and manage partnerships with venture capital firms and accelerators in the Healthcare and Life Science sectors. They will also drive the startup pipeline by sourcing and onboarding AI startups into Nebius' global program.
As a Systems Administrator, you will maintain, troubleshoot, and integrate Linux-based systems that support production platforms. You will work closely with infrastructure, networking, and data center teams to ensure reliable services.
The Principal, Business Development – Physical AI is responsible for developing and driving the Go-To-Market strategy for the Physical AI ecosystem. This includes creating a comprehensive strategy, securing relationships with innovative companies, and enabling the sales organization.
The role involves developing a data plane for Nebius Cloud network services and improving service performance and reliability. Additionally, the engineer will automate complex cross-service interaction scenarios and perform load testing for services.
As a Senior Backend Software Engineer, you will design, build, and operate backend services that are reliable, scalable, and performant. You will own services end-to-end, from architecture to production support.
As a Software Engineer, you will design and build systems for provisioning, configuring, testing, and managing physical hardware at scale. You will collaborate with hardware, networking, and data center operations teams to ensure robust and scalable platforms.
The Senior Software Engineer will design, build, and own backend systems for metrics and monitoring large-scale infrastructure. Responsibilities include evolving metrics pipelines and investigating production incidents.
The Data Center Technician Manager leads technical teams responsible for cabling, hardware installation, and data center operations. This role involves mentoring technicians and ensuring high-quality execution of installations and maintenance.
The Data Center Site Manager is responsible for ensuring the reliability, safety, capacity, and performance of a flagship U.S. site. This includes leading a multi-disciplinary operations team and managing various aspects of data center operations.
Ensure the reliability, availability, and performance of compute nodes running VMs. Troubleshoot complex production issues and lead incident response and root-cause analysis.
Design and develop a large-scale LLM training platform while maintaining optimal performance, scalability, and reliability of the ML infrastructure. Improve job scheduling strategies to minimize resource fragmentation.
The role involves managing strategic client relationships, uncovering new business opportunities, and leading the end-to-end sales process. You will engage with clients to demonstrate the value of AI and cloud solutions and negotiate contracts with executives.
Develop and optimize low-level kernels and runtime components for AI inference. Collaborate with ML and backend teams to optimize end-to-end execution.
Own key tracks in Cluster Experience, focusing on reliability, performance, and user experience for distributed ML workloads. Define product direction and drive cross-functional execution across various teams.
As a Cloud Solutions Architect, you will design and implement solutions for clients, providing technical expertise and guidance. You will also conduct workshops and help optimize pipeline performance for efficient cloud resource utilization.
The GPU Cluster Architect will drive the design of next-generation AI infrastructure, making architectural decisions across compute, networking, and storage. Responsibilities include architecting scalable GPU cluster topologies and analyzing AI/ML workloads to inform design tradeoffs.
The Infrastructure Security Engineer will design and implement security measures for cloud and on-premises infrastructure, identify and remediate vulnerabilities, and collaborate with teams to integrate security best practices. They will also stay updated on security threats and serve as a subject matter expert.
Design and implement LLM-based solutions using Nebius Token Factory’s inference services to drive business value. Collaborate with product and engineering teams to surface customer feedback and shape the platform roadmap.
Conduct experiments to efficiently train large language models and explore methods of guided generation and search. Mine relevant data at web scale and conduct experiments with various reinforcement learning configurations.
Ensure fault-tolerance, scale, and uninterrupted operations for the service. Use cutting-edge cloud technology to solve a variety of infrastructure problems.
The Senior Hypervisor Engineer will optimize I/O for emulated devices, integrate the hypervisor with platform services, and improve guest system support. The role also involves collaborating with the open-source community and providing low-level security for the hypervisor.
The Senior Hypervisor Engineer will optimize I/O for emulated devices, integrate the hypervisor with platform services, and improve guest system support. They will also advance open source virtualization and provide low-level security for the hypervisor.
The Senior HPC Cluster Engineer will tune the performance of GPU clusters and InfiniBand networks, analyze and troubleshoot issues, and integrate new hardware into the existing infrastructure. The role also involves enhancing automation systems for proactive monitoring and managing GPU devices and InfiniBand fabrics.
The role involves developing a data plane for Nebius Cloud network services and improving service performance and reliability. Additionally, the engineer will automate complex cross-service interaction scenarios and perform load testing for services.
You will manage complex, multi-stakeholder deals from prospecting to close across AI-native and enterprise accounts. Additionally, you will build and expand relationships with C-suite, engineering, and procurement leaders within target accounts.
The Site Selection & Colocation Manager is responsible for identifying, evaluating, and securing optimal locations for new data center developments and colocation expansions. This includes conducting market assessments, coordinating technical due diligence, and managing vendor relationships.
Technical Project Manager (Devtools and Observability Platform)
Nebius
·
Full Time
·
6 months ago
Nebius
The Senior Technical Project Manager will set up feedback loops from engineering teams, drive AI-related experiments, and enhance planning and processes. This hands-on role focuses on structuring and pushing initiatives forward rather than extensive reporting.
The Compute Node team builds services for managing Virtual Machines on GPU servers and develops the Virtual Machine Scheduler for clusters with thousands of servers. This involves integrating with disk management and virtual networks across multiple data centers.
Senior Site Reliability Engineer — AI Studio (Inference Platform)
Nebius
·
Full Time
·
6 months ago
Nebius
You will own the reliability, performance, and observability of the entire inference stack. This includes designing telemetry pipelines, tuning Kubernetes autoscalers, and creating automation for incident management.
The Site Selection & Colocation Manager is responsible for identifying, evaluating, and securing optimal locations for new data center developments and colocation expansions. This role involves collaboration with various teams to ensure that selected sites meet operational, technical, and business requirements.
Technical Project Manager (Devtools and Observability Platform)
Nebius
·
Full Time
·
7 months ago
Nebius
The Senior Technical Project Manager will set up feedback loops from engineering teams, drive AI-related experiments, and help mature planning and processes. This hands-on role focuses on structuring and pushing projects forward rather than extensive reporting.
Manage complex, multi-stakeholder deals from prospecting to close across AI-native and enterprise accounts. Build and expand relationships with C-suite, engineering, and procurement leaders within target accounts.
The Talent Sourcer will be responsible for sourcing candidates for various roles within Data Centers, including technical positions. They will manage talent pools and collaborate with the recruitment team to ensure a positive candidate experience.