About the Role:
The AI Data Infrastructure sits at the heart of Stack’s machine learning and data engineering worlds. As part of the larger AI organization, we build and maintain the platform that transforms petabytes of data into actionable insights every month that fuels our deep learning needs. If you're a mission-driven engineer passionate about building state-of-the-art data systems that power AI, we want you on our team! We're a highly cross-functional group of engineers dedicated to providing our machine learning engineers with the best tools possible. We architect and deploy robust data warehouses, data pipelines, and cloud services to manage Stack's mission-critical data, and we power our internal data mining platform, which leverages large language models, vector databases, and federated query engines.
Responsibilities:
- Architectural Design: Design and oversee the architecture of the data platform, ensuring scalability, reliability, and performance.
- Data Management: Ensure efficient storage, retrieval, data quality, consistency, and governance across the platform. Develop and implement strategies for data lifecycle management, including archiving and purging.
- Collaboration: Collaborate with cross-functional teams to understand data requirements and design appropriate solutions.
- Technology Stack: Stay updated with the latest technologies and trends in data engineering, making recommendations for new tools and best practices.
- Performance Optimization: Identify and resolve performance bottlenecks in data processing and storage.
- Promote Engineering Excellence: Set a culture of engineering excellence within the team and work closely with the management and customer teams to balance between speed of delivery and quality of engineering artifacts.
Qualifications:
- Education: Degree in Computer Science or related field
- 5+ years of professional software development experience.
- Expert level development skills with Python and SQL
- Strong experience with Big Data technologies (data warehouses, data lakes, orchestration, etc)
- Experience building data platforms used by other developers
- Experience implementing near real-time data pipelines for applications, BI analytics, and ML pipelines.
- Expert level working knowledge of Data Lake technologies, data storage formats (Parquet, Iceberg) and query engines (Trino) and associated concepts for building optimized solutions at scale.
- Experience in designing data streaming and event-based data solutions (Kafka, Kinesis, SQS/SNS or the like)
- Experience with data pipelines tools (Flink, Spark or like) and orchestration tools such as Airflow or Flyte and data transformation tools such as SQLMesh or DBT.
Nice to haves:
- Prior experience with Bazel and CI/CD tooling.
- Prior AV or robotics industry experience.
- Prior experience being a tech lead in the data engineering or infrastructure space.
#LI-TT1