Job Title: ML Engineer – Experimentation Platform
Experience: 3 – 4 Years
Location: Remote
Notice Period: Immediate Joiners Only
About the RoleWe are looking for a highly skilled ML Engineer to join our Test & Learn Platform team. In this role, you will build and scale experimentation and causal inference services that enable business teams to make data-driven decisions globally.
You will work across statistical modeling, API development, cloud-native infrastructure, and large-scale data processing to deliver reliable and production-ready ML solutions.
Key Responsibilities
- Develop and maintain statistical and machine learning modules for:
- Difference-in-Differences (DID)
- Synthetic Control
- A/B Testing
- Multi-Treatment Effects
- Build and extend RESTful APIs using FastAPI and integrate them with web applications through SDK wrappers
- Design and optimize large-scale data pipelines using PySpark, Delta Lake, and Azure Data Lake
- Diagnose and resolve Out-of-Memory (OOM) issues in PySpark workloads by optimizing:
- Memory allocation
- Partitioning
- Broadcast joins
- Caching strategies
- Spark configurations
- Deploy and manage Databricks workloads including notebooks, job clusters, and Delta Lake tables
- Containerize and deploy services using Docker, Kubernetes, and CI/CD pipelines
- Ensure code quality, testing, and security using PyTest, SonarCloud, and Snyk
- Collaborate closely with Data Scientists and Product teams to convert research concepts into scalable production systems
-
Mandatory Skills
- Strong experience in Python (3.9+)
- Hands-on expertise in:
- PySpark & Spark Internals
- Databricks
- FastAPI / API Development
- Azure Cloud Platform
- Kubernetes & Docker
- PyTest
- Strong understanding of:
- DID
- Synthetic Control
- A/B Testing
- Hypothesis Testing
- Panel Data Methods
- Expertise in statistical and ML libraries:
- statsmodels
- scikit-learn
- SciPy
- Pandas
- NumPy
Technical Requirements
PySpark & Spark Internals
- Strong understanding of Spark memory model
- Executor tuning and shuffle optimization
- Diagnosing and resolving OOM errors
- Experience with:
- Broadcast thresholds
- Partition skew handling
- Spill-to-disk optimization
-
- GC tuning
Databricks
- Hands-on experience with:
- Job orchestration
- Cluster configuration
- Notebook workflows
- Delta Lake optimization
- Z-ordering, compaction, and caching
Cloud & DevOps
- Azure Storage, Azure ML, and Azure Data Lake
- Docker-based containerization
- Kubernetes orchestration for ML workloads
- CI/CD pipeline integration
Testing & Quality
- Unit and integration testing using PyTest
- Familiarity with SonarCloud, Snyk, and GitHub Actions
Good-to-Have Skills
- Experience with Celery and Redis for async task orchestration
- Familiarity with Polars, PyArrow, or SQLAlchemy
- Background in econometrics or experimental design
- Experience with Spark UI profiling and performance benchmarking
- Knowledge of advanced CI/CD tooling and automation practices
Preferred Candidate Profile
- Strong analytical and problem-solving abilities
- Ability to work independently in a remote setup
- Excellent collaboration and communication skills
- Passion for building scalable ML and experimentation platforms
Tech Stack
Languages & Libraries: Python, Pandas, NumPy, SciPy, statsmodels, scikit-learn
Big Data: PySpark, Spark Internals, Delta Lake
Cloud & Platforms: Azure, Databricks, Azure Data Lake
APIs & Backend: FastAPI
DevOps: Docker, Kubernetes, GitHub Actions
Testing & Security: PyTest, SonarCloud, Snyk