Please mention DailyRemote when applying
Role Overview
We are seeking a skilled Data Engineer with strong hands-on experience in ETL development, Databricks, PySpark, Python, cloud services, and data warehousing. The primary responsibility of this role is to design, build, and maintain scalable data pipelines and backend data services that support analytics, reporting, application integrations, and enterprise data platforms.
The ideal candidate should also have backend development experience and practical exposure to modern AI-enabled data solutions, including RAG, LangChain, vector databases, embeddings, and LLM-based applications. Generative AI experience is considered an added advantage and should complement the core data engineering responsibilities.
Key Responsibilities
Design, develop, and maintain scalable ETL/ELT pipelines for batch and near-real-time data processing.
Build data engineering solutions using Python, SQL, PySpark, Apache Spark, Databricks, Airflow, Matillion, DBT, and related technologies.
Develop and optimize Databricks notebooks, jobs, workflows, Spark transformations, and Delta Lake-based processing pipelines.
Ingest, transform, validate, and load data from APIs, cloud storage, databases, logs, FTP/SFTP servers, files, and enterprise applications.
Work with cloud services across AWS and/or Azure, including storage, compute, serverless processing, monitoring, logging, and managed data services.
Build and maintain backend services and APIs using Python frameworks such as Flask, FastAPI, or similar technologies to expose data, trigger pipelines, and support downstream applications.
Design and support data models, curated datasets, warehouse tables, and lakehouse layers for analytics, reporting, operational dashboards, and AI-driven use cases.
Work with modern data warehouses and databases such as Snowflake, Amazon Redshift, PostgreSQL, SQL Server, MySQL, Oracle, OpenSearch, or similar platforms.
Implement data quality checks, logging, monitoring, alerting, exception handling, and pipeline failure recovery mechanisms.
Collaborate with data analysts, data scientists, backend developers, cloud engineers, and business stakeholders to deliver reliable and production-ready data solutions.
Support GenAI-enabled data use cases where required, including RAG pipelines, document ingestion, embedding generation, vector search, and LangChain-based workflows.
Assist in integrating enterprise data with LLM applications while ensuring proper metadata filtering, access controls, tenant isolation, and grounded response generation.
Participate in CI/CD, version control, deployment, and production support activities using Git, GitHub, Jenkins, Docker, CodePipeline, ECS, or similar tools.
Required Qualifications
6 years of experience in data engineering, ETL development, backend data services, or cloud data platforms.
Strong hands-on experience with Python, SQL, PySpark, Apache Spark, and ETL/ELT pipeline development.
Practical experience with Databricks Data Engineering, including notebooks, jobs, workflows, Spark jobs, and scalable transformation pipelines.
Experience working with cloud platforms such as AWS or Azure.
Experience with cloud services such as AWS S3, Glue, Lambda, EMR, Redshift, RDS, ECS, EC2, SQS, Azure Blob Storage, ADLS, Cosmos DB, Azure AI Search, Key Vault, Application Insights, or Log Analytics.
Strong understanding of data warehousing concepts, dimensional modeling, data marts, curated layers, and warehouse performance optimization.
Experience working with data warehouses and databases such as Snowflake, Redshift, PostgreSQL, SQL Server, MySQL, Oracle, OpenSearch, or similar systems.
Experience building backend services or APIs using Python, Flask, FastAPI, or similar backend frameworks.
Good understanding of data ingestion patterns, file formats, data validation, schema handling, metadata management, and pipeline orchestration.
Experience with workflow orchestration and transformation tools such as Airflow, Databricks Workflows, Matillion, or DBT.
Ability to troubleshoot production data issues, optimize Spark jobs, tune SQL queries, and improve pipeline performance.
Strong documentation, communication, and cross-functional collaboration skills.
GenAI / AI Add-On Skills
Working knowledge of RAG architecture, including document ingestion, chunking, embeddings, retrieval, and response generation.
Experience or exposure to LangChain, prompt orchestration, LLM integration, and AI-powered search workflows.
Familiarity with vector databases or vector search platforms such as Azure AI Search, OpenSearch, ChromaDB, FAISS, Pinecone, Weaviate, Milvus, or similar tools.
Understanding of how structured, semi-structured, and unstructured data can be prepared and indexed for GenAI applications.
Exposure to LLM platforms such as OpenAI, Azure OpenAI, Claude, AWS Bedrock, or Hugging Face.
Ability to support AI applications from a data engineering perspective by preparing, securing, indexing, and retrieving enterprise data.
Preferred Qualifications
Experience building production-grade data pipelines that process large volumes of records.
Experience with Delta Lake, Medallion Architecture, data lakehouse design, and Databricks performance optimization.
Experience with backend API development for data access, pipeline triggering, metadata management, or analytics integration.
Exposure to enterprise search, document intelligence, log analytics, observability, or security data platforms.
Experience with containerized deployments using Docker, AWS ECS, Kubernetes, or similar cloud-native services.
Familiarity with CI/CD pipelines, automated deployments, release documentation, and production support practices.
Technical Skills
Category | Skills / Technologies |
Primary Skills | Python, SQL, ETL/ELT, PySpark, Apache Spark, Databricks, Cloud Services, Data Warehousing |
Data Engineering | Airflow, Matillion, DBT, Delta Lake, Data Pipelines, Data Quality, Batch Processing, Orchestration |
Cloud | AWS, Azure, Snowflake |
Data Platforms | Snowflake, Redshift, PostgreSQL, SQL Server, MySQL, Oracle, OpenSearch |
Backend | Flask, FastAPI, REST APIs, Python Services, API Integration |
GenAI Add-On | RAG, LangChain, Vector Databases, Embeddings, LLM Integration, Prompt Engineering |
DevOps | Git, GitHub, Jenkins, Docker, CodePipeline, UCD, ECS |
Education
Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, Artificial Intelligence, Machine Learning, or a related technical field.
Stop the endless job search. Our AI finds and applies to the best jobs for you.
Discover remote opportunities in Others
Answer easy questions
200,000+ jobs across 15+ categories
Get your best job matches
Only hand-screened, legit jobs
Find a remote job faster
No ads, scams, or junk
“ I was the first applicant for a remote marketing position that got listed on the company website the same day I applied. Had an interview within 48 hours!