Senior Data Architect (Hands on)

 Posted 16 hours ago
     
10+ years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Own and enforce the canonical data model and architecture to ensure AI/ML readiness and cross-product consistency. Lead the modernization of data pipelines and storage patterns while mentoring engineers on schema discipline and governance.

GENERAL DESCRIPTION

The Senior Data Architect owns our canonical data architecture — the schema, contracts, tenancy, and governance that every product and every AI/ML workload builds on. You are the single owner of the canonical data model: one normalized definition of the core business objects shared across our products, and the standard the rest of engineering builds against. This is a foundational, hands-on role — you design, prototype, and ship reference implementations and in-repo guardrails, not just diagrams.

Our approach to AI is to build durable, domain-specific data assets rather than commodity model infrastructure: we don't pretrain foundation models and we don't ship thin wrappers around someone else's. The differentiated value lives in how our data is modeled, governed, and made trustworthy for AI — and that is the layer you own.

KEY RESPONSIBILITIES

AI/ML readiness

  • Architect the data layer so AI/ML workloads — vector search, embeddings pipelines, RAG-grounded retrieval, model training — run on a clean, governed substrate.
  • Make production data AI-ready: well-modeled, contract-enforced, lineage-tracked, and drift-detectable.
  • Design the data-side integration patterns these workloads depend on, such as feature-store and vector-store patterns across document, relational, and embedding data.

Data architecture

  • Own the canonical data model — the normalized definition of the core business objects shared across our products — and decide what is canonical versus tenant-specific.
  • Establish data architecture standards, data contracts, and schema discipline the rest of engineering builds against, enforced in-repo.
  • Exercise strong polyglot-persistence judgment: what belongs in document vs. relational vs. vector stores, and how to migrate between them without big-bang rewrites.
  • Define the multi-tenant data architecture: tenancy isolation, data residency posture, and per-tenant cost attribution across storage and compute.

Modernization

  • Lead staged modernization toward the right mix of stores and patterns for transactional, analytical, and AI/ML use cases — improving scalability, governance, and usability while minimizing disruption.
  • Own the architectural direction of the data pipeline and lake / lakehouse layer: ingestion, transformation, orchestration, and storage tiers.
  • Lead the move from homegrown pipelines to proven, industry-standard platforms, balancing build-vs-buy and total cost of ownership.
  • Modernize legacy data-access patterns via incremental, strangler-fig migrations that keep production stable.

Technical leadership

  • Drive hands-on prototypes, reference implementations, and in-repo guardrails.
  • Define the data, storage, and retrieval patterns the rest of engineering builds against.
  • Establish data quality, testing, lineage, and observability standards for pipelines and AI/ML serving.
  • Mentor engineers on schema discipline, modern data practices, and AI/ML-readiness patterns.
  • Make canonical decisions that are time-boxed, written, and defensible; hold disagree-and-commit rather than letting schema debate become a standing committee.
  • Use AI-assisted development tools (Claude Code, Copilot, Cursor) as a force multiplier for schema design, query tuning, and migration scripting.

Cross-team partnership

  • Partner with database engineering on production data health while owning long-term architectural direction.
  • Partner with ML and application engineering on their data needs — structuring and governing data so it is retrieval-ready and safe to build on.
  • Partner with platform / infrastructure on reliability, disaster recovery, residency, and the multi-tenant operational posture.

QUALIFICATIONS

  • 8+ years in data architecture, data engineering, database administration, or analytics engineering, with 3+ years in senior / lead roles.
  • Demonstrated ownership of a canonical or enterprise data model / cross-product schema — the model and contracts other teams built against.
  • Hands-on MongoDB at production scale (Atlas M40+ ideal): document modeling, aggregation framework, indexing, change streams, sharding, replica sets — and the judgment to recognize the Mongo-as-RDBMS anti-pattern.
  • Strong polyglot-persistence judgment: deciding what belongs in documents vs. relational vs. a vector store, and migrating between them incrementally.
  • Hands-on relational depth: schema design, indexing strategy, and query tuning, plus familiarity with vector search (Atlas Vector Search, pgvector, or equivalent).
  • Production experience making data AI/ML-ready: data architecture supporting RAG, semantic search, embeddings / vector pipelines, or agentic workloads.
  • Multi-tenant architecture experience: data residency and per-tenant cost attribution.
  • Pipeline / ELT / lake / lakehouse design at scale, with incremental migration strategies that minimize disruption.
  • Cloud-native data services (Azure, AWS, or GCP).
  • Strong grasp of data quality, testing, lineage, and monitoring — including observability for pipelines and AI/ML serving.
  • Comfortable modeling a complex, specialized domain. MEP / AEC / construction experience is a plus; appetite to learn the domain is required.

NICE TO HAVE

  • Knowledge-graph, ontology, or semantic-layer experience.
  • CDC and cross-engine sync (MongoDB Change Streams, Debezium, or equivalent).
  • Lakehouse platforms (Databricks, Snowflake, or open table formats — Iceberg, Delta, Hudi) and feature stores (Feast or equivalent).
  • Data governance for AI/agent access to production data: query-cost controls, read-path safety, lineage, and audit for higher-risk use cases.
  • SOC 2 and data-classification experience.
  • Azure data ecosystem (Data Factory, Synapse, Functions, Event Grid).
  • MongoDB certification (Associate DBA / Developer or higher) or substantive MongoDB University coursework.

WHAT SUCCESS LOOKS LIKE — FIRST YEAR

  • The canonical data model is owned and enforced: teams build against stable, documented contracts instead of bespoke forks.
  • Workloads sit in the right stores, legacy anti-patterns are receding, and reliability targets are holding.
  • Tenancy is formalized and per-tenant cost attribution is instrumented, so cost and capacity are observable as we scale.
  • The data substrate is AI-ready — model, contracts, and lineage in place — so AI/ML work builds on a solid foundation rather than waiting on data.
  • You've done it in partnership: the data tier is healthier, and engineers build against your contracts.

BENEFITS

  • Comprehensive and competitive health benefits plan
  • Matching 401k contributions
  • 20 days annual PTO
  • Primarily remote work with occasional annual team onsites


This is a fully remote position open to candidates based in the United States.

Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Data Architect

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified