QA Lead

 Posted 4 hours ago
  
 India
  
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

Own end-to-end data quality by architecting LLM-based quality checks and agentic remediation pipelines. Translate complex B2B dataset insights into product recommendations and maintain a scalable skills library for AI agents.

Firmable is the market-leading B2B sales intelligence platform in Asia pacific — and we're scaling that success globally at pace. Backed by leading investors and growing 2,000+ customers strong, we exist to give sales teams an unfair advantage: the deepest company and people data of any platform, enriched with real-time signals, served at the right moment by intelligent agents.

We're not building another search box. We're building the engine that tells a salesperson exactly who to call, why, and what to say — before they even ask.

The Role

This isn't a traditional QA role where you write test plans and chase tickets. As Data QA Lead at Firmable, you'll own data quality across our platform end-to-end — designing the systems that decide whether the data flowing in is trustworthy enough to ship.

You'll spend your time deep in Snowflake, Python, and LLMs — architecting LLM-based quality checks, building the eval harnesses that hold them accountable, designing the skills library that other agents and teammates depend on, and shipping agentic remediation pipelines where rules, LLMs, and humans each do what they're best at.

This role has deep technical ownership. You'll make the hard rule-vs-LLM calls, build the production systems behind them, and translate findings into recommendations that change what we build.

What You'll Own

Data Analysis & Product Intelligence

  • Investigate complex datasets using SQL, Snowflake, Python, and LLMs — directing agents to do the heavy lifting while you frame the question and judge the answer

  • Unlock insights in large-scale B2B datasets that shape product direction and commercial strategy

  • Build scalable, reusable datasets — and the skills that make them queryable by any agent or teammate

  • Translate findings into recommendations that move metrics, not slides that summarise what happened

Data Quality & Source Management

  • Own data quality end-to-end across rule-based and LLM-based checks, side by side — design the checks, run the evals, monitor drift, fix root causes

  • Make the rule-vs-LLM judgement call on every check: deterministic logic where rules win, LLMs where semantic, contextual, or entity-resolution nuance is needed — and justify the split

  • Assess and onboard new data sources: coverage, freshness, accuracy, and where LLM judges add lift over deterministic profiling

  • Track down the hardest data bugs and fix them at the root, partnering with engineering and product

  • Scrape or source supplementary data when it sharpens insights or enriches the product

AI-Powered Analysis & Automation

  • Architect LLM-based quality checks with explicit rubrics, structured outputs, and labelled eval sets — precision/recall measured, not vibed

  • Build and maintain the skills library (SKILL.md specs) that powers recurring workflows — quality checks, remediation proposals, dataset onboarding — versioned, documented, and invocable by any agent or teammate

  • Ship agentic remediation pipelines: triage with rules, escalate to LLMs, propose fixes, log every call (prompt version, model, cost, latency, decision), surface human-review queues

  • Own the eval and observability scaffolding — prompt versioning, traces, drift detection when a vendor silently updates claude-sonnet-latest, cost ceilings with token math behind them

  • Set the model-choice playbook — Haiku for cheap classification, Sonnet for nuanced judgement, frontier models for hard edge cases — and revise it as model economics shift

Cross-Functional Support

  • Be the technical point of escalation for data quality across product, engineering, and go-to-market — the person trusted when a number is questioned

  • Partner with product and engineering on architecture decisions where data quality is in the loop

  • Build the monitoring surface stakeholders operate against: DQ trend by check type (rule vs. LLM), top failing reasons surfaced by LLM judges, cost and latency over time, human-review backlog

What We're Looking For

Must Haves

  • 6+ years in data quality, data engineering, or analytics, with a strong focus on data quality systems

  • Expert-level SQL and Snowflake — complex queries, performance tuning, warehouse design, daily comfort across very large datasets

  • Strong Python skills — pandas, numpy, scripting, automation, and production-grade code. You write systems, not notebooks.

  • Shipped real work with agentic IDEs — Claude Code, Cursor, or equivalent. Not "tried it" — built and merged real systems with it.

  • Deep, demonstrable expertise building agents, skills, and tool-calling pipelines — you've architected agent workflows, written SKILL.md specs others depend on, and built tool-calling systems running in production. You can show us the repos.

  • You operate LLMs as production systems — you've designed eval harnesses, run labelled eval sets, versioned prompts, logged traces, debugged judges on precision/recall, and detected drift on vendor model updates

  • Sharp judgement on rules vs. LLMs — you reach for deterministic logic when it's the right tool and don't default to an LLM because it feels modern

  • Deep knowledge of data quality principles — validation, monitoring, observability, lineage, and the instinct to chase issues from symptom to root cause across pipelines

  • Proven skill with BI and data visualisation — Tableau, Looker, Power BI, or equivalent

  • Comfort working cross-functionally with product, engineering, sales, and marketing

  • A product mindset — you care about how data drives customer value and revenue, not just whether the pipeline ran

Highly Valued

  • Experience with data warehousing and relational modelling; NoSQL familiarity a plus

  • Experience with web scraping frameworks and best practices

  • Familiarity with cloud platforms — AWS, GCP, or Azure

  • Experience assessing and onboarding third-party data sources at scale

  • Background in B2B data, entity resolution, or structured/semi-structured datasets

  • Understanding of data privacy and compliance considerations

How We Build

AI-Native, Not AI-Assisted

Firmable is built on an AI-native engineering philosophy — and we mean it literally. AI is not a productivity tool bolted onto traditional analyst work. AI is the workflow. Every analyst at Firmable operates with fully agentic development, evals, traces, and AI-powered review pipelines as their default mode of working.

This means:

  • Agentic development: checks, datasets, and pipelines are designed, scaffolded, and iterated with AI agents doing the heavy lifting — you direct, review, and elevate

  • Skills over scripts: recurring workflows are packaged as versioned SKILL.md specs that any teammate or agent can load and run

  • Evals as a default, not an afterthought: every LLM check ships with a labelled eval set, measured precision/recall, and a prompt version you can roll back

  • Traces and observability from day one: every LLM call is logged with prompt version, model, cost, latency, and decision — retrofitting this later is not the plan

  • Continuous AI feedback loops: model drift, prompt regression, and cost ceilings are monitored the same way pipeline health is

If you're not already working this way, this role will require a rapid and genuine mindset shift. We're not looking for people who are open to AI-native work — we're looking for people who already live it.

The Operating Environment

Firmable runs lean and ships fast — intentionally small teams, no layers, minimal process, and a weekly release cadence moving toward daily. Teams own their stack end to end: you design it, you build it, you ship it, you run it.

This is a startup-to-scaleup environment and it comes with real expectations. There are no fixed hours. The pace is high, the team is always building, and when something matters it gets done. In return, you get genuine ownership, a seat at the table on every major architecture decision, and the opportunity to build something that doesn't exist anywhere else in the market.

Why This Role

  • Own data quality at the heart of one of the fastest-growing B2B intelligence platforms in APAC — every check you build, every skill you ship, every dataset you onboard reaches every Firmable customer

  • Greenfield AI-native scaffolding — the eval harnesses, skills library, and agentic quality pipelines are largely unbuilt; you'll shape them

  • Work at the frontier — LLM-as-judge at production scale, agentic remediation, drift detection on vendor models, and rule-vs-LLM orchestration are genuinely hard, genuinely novel problems

  • Small team, massive leverage — your work reaches every Firmable customer, every day

  • Competitive base + meaningful equity — we balance strong compensation with a share in the upside we're building toward

Similar Jobs

See all Remote Others jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Others

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified