Design and operate the infrastructure and tooling for production AI systems, including ML pipelines and deployment automation. Ensure models reach production reliably through MLOps practices, monitoring, and cross-functional collaboration.
About US:
Foundation AI is the only AI Native documents intake automation platform serving the claims and litigation industries. Founded in 2019 by a team of lawyers and data scientists, Foundation AI processes millions of documents each month for hundreds of US law firms, including many of the largest and most respected plaintiff and injury law firms in the country. Find out more at www.foundationai.com.
Job Overview:
At Foundation AI, we are looking for a Senior Software Engineer to join our AI Pipeline team. In this role, you will design, build, and operate the infrastructure and tooling that powers our production AI systems - spanning model versioning, data and prompt pipelines, experimentation frameworks, and deployment automation. You will work at the intersection of ML engineering and platform engineering, ensuring our models reach production reliably, safely, and at scale. We are looking for an excellent problem solver and proficient coder with strong adaptability, communication skills, and a drive to learn.
Key Responsibilities:
- ML Pipeline Development: Design, build, and maintain end-to-end ML pipelines covering data ingestion, preprocessing, model training, evaluation, and serving. Ensure pipelines are reproducible, observable, and production-grade.
- MLOps & Model Lifecycle Management: Own model versioning, data versioning, and prompt versioning across environments. Implement rollout automation, canary deployments, and rollback mechanisms for safe model releases.
- Experimentation & A/B Testing: Build and operate side-by-side deployment infrastructure and A/B testing frameworks to evaluate model variants in production with rigorous statistical guardrails.
- Monitoring & Observability: Implement drift detection, data quality monitoring, and alerting across the pipeline stack. Define SLOs for model and pipeline health and drive incident response.
- CI/CD for ML: Extend CI/CD practices to the ML lifecycle—automating training triggers, evaluation gates, and deployment workflows integrated with the broader engineering delivery pipeline.
- System Architecture: Design and implement robust, high-performance, and secure ML infrastructure. Evaluate and adopt tooling (Bedrock, MLflow, Airflow, and others) to accelerate the team’s capabilities.
- Technical Leadership: Provide mentorship and guidance to junior engineers, foster a culture of knowledge-sharing, and influence ML engineering best practices at the team and organizational level.
- Code Reviews & Quality: Ensure code quality through peer reviews, unit testing, and adherence to coding standards across pipeline and platform code.
- Cross-Functional Collaboration: Work closely with ML scientists, product managers, and infrastructure teams to translate model development needs into reliable production systems.
- Security & Compliance: Ensure pipelines and model artifacts follow best security practices and industry compliance standards relevant to legal document processing.
- Documentation: Maintain clear technical documentation for pipelines, model registry conventions, and operational runbooks.
Responsibilities may be tailored based on the candidate’s experience and proficiency.
Skills and Tools:
- Experience: 5+ years in software engineering, with at least 2–3 years in ML engineering, MLOps, or AI platform roles.
- MLOps & ML Lifecycle: Hands-on experience with model versioning, data versioning, prompt versioning, experiment tracking, and deployment automation in production environments.
- Pipeline Tooling: Proficiency with workflow orchestration (Apache Airflow or equivalent), experiment tracking (MLflow or equivalent), and cloud-based model hosting (AWS Bedrock or equivalent).
- A/B Testing & Rollout Automation: Experience designing and operating side-by-side deployments, shadow mode evaluation, canary releases, and automated rollback strategies for ML models.
- Monitoring & Observability: Familiarity with model drift detection, data quality monitoring, and pipeline alerting; experience defining and tracking ML-specific SLOs.
- Cloud Infrastructure: Experience with AWS services (S3, ECS/EKS, Lambda, Step Functions, or equivalents); comfort operating in a cloud-native environment.
- Programming & Development: Proficient in Python; writes scalable, maintainable, and secure code. Experience with SQL and familiarity with data engineering patterns is a plus.
- CI/CD for ML: Experience extending CI/CD principles to ML workflows, including automated training pipelines, evaluation gates, and model promotion flows.
- Architecture & Design: Designs modular, high-performance systems; able to drive technical decisions and articulate trade-offs clearly.
- Testing & Quality: Implements automated testing for pipeline components; values reproducibility and reliability in ML systems.
- Problem-Solving & Critical Thinking: Tackles ambiguous, complex challenges; evaluates trade-offs across performance, reliability, and development velocity.
- Communication & Leadership: Guides teams effectively, communicates technical strategy clearly, and influences architectural decisions across functions.
Education
A B-Tech degree in Computer Science or equivalent experience relevant to the functional area.
Our Commitment:
Foundation AI is an equal opportunity employer committed to diversity and inclusion in the workplace. We prohibit discrimination and harassment of any kind based on race, color, sex, religion, sexual orientation, national origin, disability, genetic information, pregnancy, or any other protected characteristic. Our hiring decisions are based solely on qualifications, merit, and business needs at the time.
For any feedback or inquiries, please contact us at careers@foundationai.com
Learn more about us at www.foundationai.com