Applied Computing

Senior Data Engineer, Forward Deployed

Posted 21 days ago

India

⭐ 5-10 years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Architect and maintain scalable Lakehouse pipelines for high-frequency industrial time-series and unstructured data. Develop dual pipelines to support both large-scale historical model training and low-latency real-time AI inference.

Applied Computing was founded in 2024 to build Orbital, a physics-informed foundation model for energy operations. We’re live across oil and gas, refineries, and petrochemicals, working towards our mission: sustainable
abundance for a growing planet.

The hydrocarbon industry keeps the world running. But its complexity has left operators tied to legacy systems, making critical decisions on less than 10% of available data. We built Orbital to change that. It’s a foundation model built specifically for energy that lets companies use AI at scale, harnessing all of their operational
data and optimising in real time for any metric. Decisions get faster, operations get safer, and carbon intensity falls.

We’ve raised over $32 million, including one of the largest seed rounds for an
AI company in the UK. We’re just getting started

The Role

As our Data Engineer, you’ll architect and maintain pipelines that make high-frequency time-series, lab, and historian data into a scalable Lakehouse architecture, usable for both deep learning models and real-time LLMs. You’ll be working across AWS (EKS, S3, EBS, KMS, CloudWatch) and Databricks/PySpark, ensuring data is contextualised, synchronised, and optimised for both deep learning models and real-time LLM workloads.

This isn’t a traditional ETL role, you’ll be solving problems at the intersection of control systems, industrial data engineering, and AI enablement.

Technical Requirements

Deep expertise in PostgreSQL (partitioning, indexing, query optimisation, storage design).
Strong proficiency in Python for data processing, scripting, and pipeline orchestration.
Hands-on experience with AWS (EKS, S3, EBS, IAM, KMS, CloudWatch, etc.)for secure and scalable data pipelines.
Proven ability to work with Databricks and PySpark for large-scale distributed data processing.
Familiarity with time-series industrial data (control systems, DCS/SCADA logs, process historians).
Experience in unstructured data sync and management within hybrid cloud/on-prem environments.
Bonus: Experience working as a data engineer in oil and gas or energy environments
Bonus: Knowledge of streaming frameworks (Kafka, Flink, Spark Streaming) or MLOps stacks for data versioning and lineage.

Core Responsibilities

1. Ingest & Contextualise Data

Ingest from OPC UA servers, process historians, IoT sensors, LIMS systems, alarms/events, and P&IDs.
Map signals to their physical processes (tags, units, hierarchies) for interpretability in AI pipelines.

2. Data Movement & Accessibility

Build pipelines that handle real-time streaming and batch ingestion into the Lakehouse.
Manage synchronisation between historian archives, unstructured files, and AWS storage (S3/EBS).
Orchestrate Databricks Lakeflow/Connectors for integrating data into Lakebase/Lakehouse.
Handle secure, high-throughput transfers between historian archives and sandbox/live environments.

3. Change Tracking & Integrity

Detect and manage schema changes, signal drift, and inconsistencies acrosstime.
Implement lineage and audit trails across Spark/Databricks and AWS pipelines.

4. Data Preparation for AI

Build and maintaindual pipelines:
- Training→ large-scale historical data prep for time-series + LLM training.
- Inference→ low-latency, real-time pipelines for anomaly detection, optimisation, and LLM search.
Support heterogeneous AI workloads (time-series forecasting and retrieval-augmented LLMs).

5. Database Performance & Optimisation

Tune PostgreSQLand sparkfor high-throughput time-series workloads (partitioning, indexing, query optimisation).
Optimise pipelines for both fast analytical queries and high-efficiency model training.
Deploy and manage data pipelines in AWS EKS (Kubernetes) with persisten tEBS-backed storage.

What Success Looks Like

Live data streams are contextualised,queryable, and AI-ready.
Schema changes and signal drift are detected and handled without breaking downstream workflows.
Training and inference pipelines run smoothly in parallel, optimised for scale and latency.

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

Applied Computing

Senior Data Engineer, Forward Deployed

AI Summary

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

IT Support Engineer

AI Backend Engineer

Senior Systems Engineer/Automation Engineer (023-1138)

Salesforce CRM Process Manager (m/w/d)

Data Scientist II

Freelance Software Developer (Backend / Cloud / AI)

Applied Computing

Senior Data Engineer, Forward Deployed

AI Summary

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

IT Support Engineer

AI Backend Engineer

Senior Systems Engineer/Automation Engineer (023-1138)

Salesforce CRM Process Manager (m/w/d)

Data Scientist II

Freelance Software Developer (Backend / Cloud / AI)

Personalize your Remote Job Search in 3 Easy Steps!