Data Pipeline Engineer (Kafka I Hadoop I Spark) - REMOTE (Eastern/Central Time)

Apply for this position Please mention DailyRemote when applying
Posted 9 days ago United States Salary undisclosed
Before you apply - make sure the job is legit.

Attempting to apply for jobs might take you off this site to a different website not owned by us. Any consequence as a result for attempting to apply for jobs is strictly at your own risk and we assume no liability.

Job Description

LexisNexis is dedicated to advancing the rule of law around the world, which is vital for building peace and prosperity in all societies. To accomplish this noble goal, LexisNexis is transforming legal research globally. Our search index contains more than 81 billion richly annotated legal documents, creating an unprecedented legal knowledge graph. Our technology changes how lawyers practice law by providing fast and relevant access to their most difficult questions.

The data engineer will be dedicated to solving the scalability issues of data ingestion pipelines for the search backend of LexisNexis, dramatically improving both velocity and consistency of ETLs from data lake to Solr. We are looking for someone who can bring their own perspective on how to solve a variety of internal and external opportunities. We expect this person to be versatile, display leadership qualities, and be enthusiastic to tackle new problems as we continue to push technology forward.

Skillset & experience required:

  • Minimum 2 years of developing and maintaining ETL pipelines in Spark, Hadoop or Kafka
  • Minimum 2 years of experience in Java or Scala
  • Minimum 2 years of scaling search server clusters to accommodate increasing traffic to meet specific performance requirements
  • Experience parsing data from XML documents
  • Experience in data modeling, design and manipulation, optimization, and best practices
  • Complete complex bug fixes.
  • Minimum 5+ years of Software Engineering experience
  • BS Engineering/Computer Science or equivalent experience

Preferred skillset & experience:

  • Advanced degree in Engineering/Computer Science.
  • Expertise in containerization techniques such as Docker and Cloud orchestration platforms such as Kubernetes.
  • Expertise in enterprise development languages such as Java or Scala.
  • Expertise in test-driven development and maintenance including techniques for applying best practices for overall project benefit. (java and cucumber scripting)
  • Software development process expert in applicable methodologies (e.g., Agile, Test Driven Development).
  • 5+ years of experience with AWS products
  • Must include intimate familiarity with SPARK
  • Familiarity with EC2, Redshift, RDS, and S3
  • Should have hands-on experience with Athena, DynamoDB, API Gateway, Lambda, and EMR
  • AWS Certification is a plus
  • Advanced Linux shell expertise. Must be able to analyze loads and tune job scheduling.
  • Familiarity with Natural Language Processing (NLP)
  • Expert Java, or SCALA programmer
  • Expert with SQL languages (ANSI, Postgres, MySQL, etc.)
  • Expertise in server backup and recovery.