Build and manage end-to-end compute, storage, and network infrastructure specifically for EDA and SoC design environments. This includes optimizing EDA license utilization, managing hybrid cloud strategies, and enabling AI/ML infrastructure for engineering workflows.
Role Overview
We are looking for a hands-on and highly strategic IT & Infrastructure Admin to build and manage the end-to-end compute, storage, network, and EDA infrastructure required for designing complex SoCs across digital and analog domains.
This role goes beyond traditional IT—it requires deep ownership of EDA environments, compute strategy (cloud vs on-prem), cost optimization, and AI infrastructure enablement, ensuring high performance, scalability, and reliability for engineering teams.
Key Responsibilities
EDA & Engineering Infrastructure
- Own setup, deployment, and management of EDA tools and environments for:
- Digital design and verification
- Analog and custom design flows
- Manage tool installations, upgrades, and compatibility across flows
- Drive EDA license management, including:
- Forecasting demand across teams and projects
- Optimizing utilization and cost
- Vendor coordination and negotiations
- Ensure high availability and performance of compute farms and storage systems
Compute & Platform Strategy
- Define and execute strategy for cloud vs on-prem infrastructure:
- Evaluate AWS (or other cloud platforms) vs owned/rented servers
- Build cost models and ROI analysis for different scaling scenarios
- Design scalable infrastructure for:
- Large regressions (DV workloads)
- RTL synthesis and physical design
- Analog simulations (compute-intensive workloads)
- Optimize job scheduling, workload distribution, and resource utilization
Network & Systems Management
- Design and manage high-performance network infrastructure:
- Low-latency, high-throughput connectivity for EDA workloads
- Secure remote access for distributed teams
- Servers, storage (NAS/SAN), and backup systems
- OS environments (primarily Linux-based)
- Data security, access control, and disaster recovery
AI Infrastructure & Enablement
- Support deployment and scaling of AI/ML infrastructure for engineering workflows
- Work with AI and engineering teams to:
- Enable AI agent workflows
- Optimize compute usage (GPU/CPU allocation)
- Define and enforce AI usage guardrails, including:
- Data security and IP protection
- Safe usage policies for internal and external AI tools
- Manage token usage, cost tracking, and access control for AI platforms
Planning, Forecasting & Cost Optimization
- Develop and maintain forecasts for:
- Compute infrastructure (cloud + on-prem)
- Storage and network capacity
- Continuously optimize for cost vs performance vs scalability trade-offs
- Provide leadership with data-driven recommendations on infrastructure investments
Required Qualifications
- Bachelor’s degree in Computer Science, Electrical Engineering, or related field
- 10+ years of experience in IT infrastructure / systems engineering, preferably in semiconductor or EDA environments
- EDA tool environments (Synopsys, Cadence, Siemens/Mentor)
- Linux system administration
- Compute cluster management and job schedulers (LSF, Slurm, etc.)
- Experience managing large-scale compute and storage systems
- Strong understanding of networking fundamentals (high-performance networks preferred)
- Experience with cloud platforms (AWS preferred)
Preferred Qualifications
- Experience supporting SoC design teams (RTL, DV, Analog)
- Familiarity with analog simulation environments and their compute demands
- Experience with hybrid cloud architectures
- Exposure to GPU infrastructure and AI/ML workloads
- Scripting skills (Python, Bash, etc.) for automation
- Experience with security and compliance in IP-sensitive environments
Key Attributes
- Strong ownership and end-to-end accountability mindset
- Ability to balance technical depth with strategic decision-making
- Bias toward automation, scalability, and efficiency
- Strong problem-solving and operational excellence
- Comfortable working in a fast-paced startup environment
Success Metrics
- Reliable, scalable infrastructure supporting high engineering productivity
- Optimized EDA license utilization and cost efficiency
- Effective cloud vs on-prem strategy with measurable ROI
- Minimal downtime and high system availability
- Secure and efficient AI infrastructure adoption
- Ability to scale infrastructure seamlessly with company growth