AI Testing

 Posted an hour ago
  
 Worldwide
  
5-10 years experience
Apply Now

Please mention DailyRemote when applying

AI Summary

The role involves embedding AI capabilities into existing QA automation tooling and frameworks used by a large engineering team. Additionally, the specialist will build and operate evaluation frameworks to certify AI systems, agents, and RAG applications before production.

BCE Global Tech's Global Quality Engineering (GQE) function is building one of Canada's most ambitious AI quality programs — certifying every AI and agentic system deployed across Bell Canada before it reaches production. As a QA AI Specialist, you sit at the intersection of artificial intelligence, software engineering, and quality assurance: a hybrid role that does not yet have a textbook, because the discipline is being written in real time.

 

You will do two things simultaneously. First, you will bring AI into GQE's existing testing practice — embedding AI-powered capabilities into the test automation tooling, pipelines, and frameworks that 250 QA engineers already use every day. Second, you will build and operate the evaluation frameworks that test the AI systems being created by other Bell engineering teams — agents, orchestration pipelines, RAG applications, Salesforce AgentForce workflows, and ServiceNow Now Assist integrations.



Requirements

Key Responsibilities:

1. AI-Enhanced QA Tooling

Modernize GQE’s QA stack by embedding AI to improve speed, coverage, and intelligence:

  • Integrate AI-driven test generation into Selenium, Playwright, and Postman frameworks
  • Use predictive models to prioritize tests based on code changes and defect history
  • Enable self-healing automation for UI/API changes
  • Automate defect triage and root-cause analysis using failure clustering
  • Support natural-language test authoring (English/French) for non-technical QA
  • Continuously pilot emerging AI testing tools via a technology radar

2. AI Evaluation & Quality Pipelines

Build scalable evaluation systems tailored for AI behavior, not rule-based logic:

  • Implement LLM-as-Judge pipelines on Vertex AI (Gemini) across key quality dimensions.
  • Generate large, diverse, and adversarial test corpora from seed intents
  • Evaluate RAG systems using metrics like faithfulness, relevance, and recall (RAGAS)
  • Validate multi-step agent workflows, tool usage, and escalation behavior
  • Embed AI evaluations into CI/CD as mandatory release gates.

3. AI Safety & Adversarial Testing

Operate a dedicated AI red-teaming capability to uncover AI-specific risks:

  • Execute prompt injection and poisoned-context attacks on RAG systems.
  • Run automated jailbreak and constraint-bypass probes (e.g., Garak)
  • Systematically test hallucination, numerical accuracy, and domain knowledge
  • Assess toxicity, bias, and fairness across English and French interactions
  • Stress-test agentic systems for runaway actions and scope violations

4. Continuous Quality Evolution

Ensure the quality framework evolves as models and systems change:

  • Monitor production AI outputs for quality drift and trigger re-certification
  • Feed real production failures back into the test corpus
  • Track model/version changes and generate quality delta reports.
  • Maintain a living benchmark of Bell-specific AI quality standards
  • Continuously adopt new evaluation research and industry best practices
  • Partner early with AI/ML teams to embed quality by design

5. AI Quality Certification Operations

Lead technical execution of the AIQC program:

  • Own Tier 2 & 3 certification testing from corpus design to red-teaming
  • Calibrate LLM-as-Judge rubrics using human-labeled golden datasets
  • Produce clear AI Quality Certificates with scores, risks, and conditions
  • Advise teams on AI testability, prompts, and evaluation instrumentation
  • Contribute to AIQC playbooks, documentation, and knowledge sharing

Required

      5+ years of software quality engineering experience, with at least 2 years working directly with AI/ML systems, LLMs, or AI-powered applications

      Hands-on experience building or evaluating LLM-based applications — including prompt engineering, RAG pipelines, or agentic workflows

      Proficiency in Python: test framework development, API integration, data processing, and evaluation scripting

      Experience with modern test automation frameworks (Playwright, Selenium, Pytest, RestAssured, Postman/Newman) and CI/CD platforms (GitHub Actions, Google Cloud Build, Jenkins)

      Working knowledge of at least one major AI/ML platform — Google Vertex AI, Azure OpenAI, or AWS Bedrock — with hands-on API usage

      Strong conceptual understanding of how LLMs work: tokenization, temperature and sampling, context windows, grounding, hallucination mechanics, and fine-tuning

      Demonstrated ability to design test strategies for non-deterministic systems — moving beyond assertion-based testing to probabilistic, rubric-based evaluation




Benefits

What We Offer:

  • Competitive salaries and comprehensive health benefits
  • Flexible work hours and remote work options
  • Professional development and training opportunities
  • A supportive and inclusive work environment
  • Access to cutting-edge technology and tools.


Similar Jobs

See all Remote Software Development jobs →

Personalize your Remote Job Search in 3 Easy Steps!

Discover remote opportunities in Software Development

Answer easy questions

Answer easy questions

200,000+ jobs across 15+ categories

Get your best job matches

Get your best job matches

Only hand-screened, legit jobs

Find a remote job faster

Find a remote job faster

No ads, scams, or junk

I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!

Sarah J. — Sarah J. · Marketing Manager ★★★★★ Verified