BCE GLOBAL TECHNOLOGY CENTRE PRIVATE LIMITED

ELK System Reliability Engineer

Posted an hour ago

Worldwide

⭐ 5-10 years experience

Apply Now

Please mention DailyRemote when applying

AI Summary

Responsible for the architecture, deployment, and maintenance of highly available ELK clusters and Fleet managed agents. This includes automating operational tasks, optimizing index performance, and implementing robust security and disaster recovery plans.

Job Description:

Architecture, deploying, managing, and maintaining highly available and fault-tolerant ELK clusters across diverse environments, encompassing, Logstash, Kibana, and Beats agents.
Implementing a Fleet managed large scale deployment of Elastic agents.
Developing and implementing comprehensive monitoring, alerting, and dash boarding strategies using Kibana visualizations and integrated alerting mechanisms to proactively identify and address system anomalies and performance degradations.
Automating routine operational tasks, deployment pipelines, and cluster upgrades through sophisticated scripting (e.g., Python, Bash) and infrastructure-as-code principles utilizing tools like Ansible.
Performing in-depth performance tuning and optimization of Elasticsearch indices, query performance, and underlying hardware/cloud resources to ensure maximum throughput and minimal latency.
Managing the ingestion pipelines, configuring Logstash filters and outputs, and ensuring efficient data flow from various sources into the Elasticsearch data stores.
Implementing and enforcing robust security measures across the ELK stack, including access control, encryption (TLS/SSL), and regular vulnerability assessments.
Troubleshooting complex issues across the entire stack, from data sources and ingestion agents through to the Elasticsearch cluster and Kibana interface, employing systematic diagnostic methodologies.
Collaborating closely with development and operations teams to understand application requirements, optimize data schemas, and facilitate effective log analysis and troubleshooting.
Designing and executing disaster recovery and business continuity plans specifically tailored for the ELK platform, ensuring data integrity and service availability.
Maintaining detailed documentation for system architecture, operational procedures, troubleshooting guides, and configuration standards

Requirements

Requirement:

Demonstrable extensive hands-on experience managing large-scale Elasticsearch clusters, including deep understanding of index management, shard allocation, replication strategies, and cluster health monitoring.
Proven expertise in administering and troubleshooting complex Linux operating systems (e.g., RHEL, Debian) at an expert level, including performance analysis.
Solid foundational knowledge of web applications, their underlying architectures, and how they interact with logging and monitoring systems.
A bachelor’s degree in computer science, Information Technology, Engineering, or a closely related technical field, or equivalent practical experience.
Possession of relevant industry certifications such as Elastic Certified Engineer, AWS Certified SysOps Administrator, Red Hat Certified Engineer (RHCE), or equivalent validation of core competencies.
A minimum of five to seven years of progressive experience in Site Reliability Engineering, Systems Administration, or DevOps roles with a strong focus on large-scale distributed systems.
Proficiency with essential infrastructure management tools, including configuration management systems (Ansible, Chef, Puppet) and orchestration platforms (OpenShift).
Expertise in scripting languages such as Bash for automation, system administration tasks, and developing operational tooling.
Thorough understanding of networking concepts, including TCP/IP, HTTP/S protocols, DNS, load balancing, and firewall configurations relevant to distributed systems.

Preferred Qualifications

Experience with message queuing technologies like Kafka or RabbitMQ for buffering and decoupling data ingestion processes.
Hands-on experience with container orchestration systems such as OpenShift, including deploying and managing Logstash within containerized environments.
Familiarity with various data collection agents beyond Beats, such as Fluentd or Vector, and their respective configuration nuances.
Knowledge of distributed tracing systems (e.g., Jaeger, Zipkin) and their potential integration or correlation with ELK data.
Familiarity with CI/CD pipelines and integrating ELK stack deployments and updates into automated release processes.
A strong grasp of system security best practices, including intrusion detection, vulnerability management, and security hardening techniques for distributed systems.

Benefits

What We Offer

Competitive salaries and comprehensive health benefits
Flexible work hours and remote work options.
Professional development and training opportunities.
A supportive and inclusive work environment

Automatically Apply to the Best Remote Jobs

Stop the endless job search. Our AI finds and applies to the best jobs for you.

Try it Now

BCE GLOBAL TECHNOLOGY CENTRE PRIVATE LIMITED

ELK System Reliability Engineer

AI Summary

Job Description:

Requirements

Requirement:

Benefits

What We Offer

Automatically Apply to the Best Remote Jobs

Ace Your Job Interview

How to Answer "How Do You Handle Criticism"?

How to Answer "Tell Me About Yourself?" in an Interview

How to Answer "What is your Experience with Customer Service?"

How to Answer "Describe Your Experience Working With Diverse Teams Or Different Cultures?"

How to Answer The Interview Question "What Sets You Apart From Other Candidates?"

How to Answer "Why Are You The Best Person For This Job"?

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Why Should We Hire You?"

How to Answer "What Areas Need Improvement?"

How to Answer "Tell Me About A Time When You Had To Balance Competing Priorities?"

How to Answer "Tell Me About a Time You Received Constructive Feedback"

How to Answer "What Is Your Greatest Accomplishment?"

Similar Jobs

Senior Data Scientist- Chennai

Head of AI Enablement & Transformation (Colombia)

Head of AI Enablement & Transformation (US)

Senior Data Scientist-Pune

Senior AI/ML Engineer — Founding Technical Lead (INDIA)

Head of AI Enablement & Transformation (Canada)

BCE GLOBAL TECHNOLOGY CENTRE PRIVATE LIMITED

ELK System Reliability Engineer

AI Summary

Job Description:

​

Requirements

Requirement:

Benefits

What We Offer

Automatically Apply to the Best Remote Jobs

Share This Job:

Similar Jobs

Senior Data Scientist- Chennai

Head of AI Enablement & Transformation (Colombia)

Head of AI Enablement & Transformation (US)

Senior Data Scientist-Pune

Senior AI/ML Engineer — Founding Technical Lead (INDIA)

Head of AI Enablement & Transformation (Canada)

Personalize your Remote Job Search in 3 Easy Steps!