Data Engineer

 Published 10 days ago
 Not specified
Apply Now Please mention DailyRemote when applying

Disclaimer: Before you apply, please make sure the job is legit.

Attempting to apply for jobs might take you off this site to a different website not owned by us. Any consequence as a result for attempting to apply for jobs is strictly at your own risk and we assume no liability.

This opportunity would be with Tools for Humanity.

About the AI & Biometrics Team:

The AI & Biometrics team is building a biometric iris recognition system that can work reliably with more than a billion users and enables them to claim their free share of WLD. We use cutting-edge machine learning deployed on custom hardware to enable high-quality image acquisition, identification, and fraud prevention, all while requiring minimal user interaction. Our technology, coupled with privacy-preserving data collection, allows us to increase system performance and reduce model bias.

We are building an iris recognition and fraud detection engine that works on the 1bn people scale. Therefore, its performance needs to out-perform all the current iris recognition technologies. We leverage our powerful custom-made iris recognition device, the Orb, combined with the latest research from the field of AI and Deep Learning

About the Opportunity:

We are seeking a Data Engineer who will play a pivotal role in the backbone of our machine learning work: the datasets. In this critical position, you'll be at the forefront of our MLOps framework, focusing on data ingestion, annotation, and dataset orchestration. You will have the opportunity to build and maintain the infrastructure that fuels our machine learning algorithms, ensuring that the data is accurate, accessible, and ready to use for model training.

In this role you will: 

  • Design, implement and maintain automations for data ingestion and human annotation request pipelines to ensure data availability to multiple ML workstreams 
  • Collaborate with our AI researchers and engineers to determine dataset requirements and create tooling to help create high-quality training and evaluation datasets to live
  • Utilise tools like Docker, Kubernetes and Terraform to deploy, scale and manage the infrastructure supporting our dataset operations 
  • Work closely with other team members, including AI researchers, software engineers and product managers, to ensure alignment and smooth dataset delivery based on current needs 

About You: 

  • 3 years of experience in the industry of Data Engineering, Software Engineering, Computer Vision or a related field 
  • Strong foundation in Python, including frameworks for data and image manipulation such as Pandas and OpenCV
  • Strong experience with Docker, Kubernetes, and Terraform 
  • Proficiency in working with MongoDB and AWS services 
  • Familiarity with continuous integration, preferably with GitHub CI 
  • Solid understanding of data pipelines, ETL processes, statistical analysis and data quality best practices
  • Previous work in the area of Computer Vision is a nice to have 

Ace Your Job Interview

Read our advice on how to answer the most common interview questions.