Position Summary
We are seeking a highly experienced Principal Observability Architect to lead the design, implementation, modernization, and optimization of enterprise-scale observability and analytics platforms. This role will serve as the technical authority for log management, observability engineering, telemetry pipelines, AIOps, security analytics, and data lakehouse architectures leveraging Splunk, Databricks, Cribl, OpenTelemetry, and cloud-native technologies.
The ideal candidate possesses deep expertise in traditional observability platforms (Splunk, Dynatrace, AppDynamics, ServiceNow ITOM) and modern data lakehouse architectures utilizing Databricks, Delta Lake, Unity Catalog, and AI/ML-driven analytics. This individual will drive the strategic transformation from legacy SIEM and observability platforms toward scalable, cloud-native observability data lakes.
Key Responsibilities
Enterprise Architecture & Strategy
- Define enterprise observability architecture standards, patterns, and roadmaps.
- Lead observability transformation initiatives involving Splunk modernization and Databricks adoption.
- Develop reference architectures for telemetry ingestion, storage, analytics, security, and AI-driven operations.
- Align observability strategies with business, security, compliance, and operational objectives.
- Create executive-level architecture presentations, business cases, and technology roadmaps.
Splunk Platform Leadership
- Architect large-scale Splunk Enterprise and Splunk Cloud environments.
- Design and optimize:
- Indexer clusters
- Search head clusters
- Forwarder architectures
- Deployment servers
- Data models
- ITSI implementations
- Define ingestion, retention, indexing, and data lifecycle strategies.
- Lead migration initiatives involving:
- Splunk to Databricks
- Heavy Forwarders to Cribl
- SIEM modernization programs
- Optimize SPL searches, data models, summary indexing, and dashboard performance.
Databricks & Lakehouse Architecture
- Architect enterprise observability data lake solutions using:
- Databricks Lakehouse
- Delta Lake
- Unity Catalog
- Delta Live Tables
- Structured Streaming
- Mosaic AI
- Genie
- Design Medallion Architectures:
- Develop governance strategies including:
- RBAC
- Data masking
- Data lineage
- Audit controls
- Create high-performance log analytics solutions capable of supporting petabyte-scale telemetry environments.
- Enable self-service analytics and AI-powered observability use cases.
Telemetry & Data Engineering
- Design ingestion architectures supporting:
- OpenTelemetry
- OCSF
- Syslog
- Kafka
- Azure Event Hubs
- AWS Kinesis
- GCP Pub/Sub
- Cribl
- Define normalization and enrichment frameworks.
- Establish data quality and schema management processes.
- Design real-time and batch processing pipelines.
AIOps & Advanced Analytics
- Lead implementation of:
- AIOps
- Predictive analytics
- Root cause analysis
- Anomaly detection
- Event correlation
- Integrate observability datasets with AI/ML platforms.
- Develop observability use cases leveraging:
- Mosaic AI
- Agentic AI
- LLMs
- Generative AI
- Build operational intelligence and executive KPI dashboards.
Security & Compliance
- Architect observability solutions supporting:
- SOC operations
- Threat hunting
- Security analytics
- Compliance reporting
- Design frameworks aligned with:
- HIPAA
- PCI-DSS
- SOX
- NIST
- ISO 27001
- Implement data governance and security controls across observability platforms.
Leadership & Governance
- Provide technical leadership to engineering teams.
- Mentor architects, engineers, and developers.
- Conduct architecture reviews and design governance.
- Define platform standards, best practices, and operational procedures.
- Engage directly with executive stakeholders and business leaders.
Required Qualifications
Experience
- 10+ years of experience in Enterprise Observability, Monitoring, or Security Analytics.
- 5+ years architecting large-scale Splunk environments.
- 3+ years designing Databricks Lakehouse architectures.
- Experience managing environments exceeding:
- 50 TB/day preferred
- 100+ TB/day strongly preferred
- Experience leading enterprise transformation programs.
Splunk Expertise
Deep expertise in:
- Splunk Enterprise
- Splunk Cloud
- Splunk ITSI
- Enterprise Security
- SPL Development
- Data Models
- Indexer Clustering
- Search Head Clustering
- SmartStore
- Heavy Forwarders
- Universal Forwarders
Databricks Expertise
Strong experience with:
- Databricks Lakehouse
- Delta Lake
- Unity Catalog
- Delta Live Tables
- Structured Streaming
- Databricks SQL
- Genie
- Mosaic AI
- Lakehouse Federation
Cloud Platforms
Experience with one or more:
- Microsoft Azure
- Amazon Web Services
- Google Cloud
Data Technologies
Strong knowledge of:
- Kafka
- OpenTelemetry
- OCSF
- Iceberg
- Spark
- SQL
- Python
- REST APIs
- Event Streaming Architectures
Preferred Qualifications
- Experience with Cribl Stream and Cribl Edge
- Experience with Dynatrace, AppDynamics, Datadog, or New Relic
- Experience with ServiceNow ITOM/Event Management
- Experience designing AI/ML operational analytics solutions
- Experience with Security Data Lakes and SIEM modernization initiatives
- Experience with FinOps and cloud cost optimization
- Experience building observability platforms for healthcare, financial services, retail, or large enterprise organizations
Certifications (Preferred)
Splunk
- Splunk Enterprise Certified Architect
- Splunk Core Certified Consultant
Databricks
- Databricks Certified Data Engineer Professional
- Databricks Certified Solutions Architect
Cloud
- Azure Solutions Architect Expert
- AWS Solutions Architect Professional
- Google Professional Cloud Architect
Success Metrics
Within the first 12 months, the architect will:
- Deliver enterprise observability architecture roadmap.
- Reduce observability platform costs through modernization initiatives.
- Design and implement a scalable observability data lake architecture.
- Improve telemetry ingestion performance and reliability.
- Enable AI-powered analytics and operational intelligence capabilities.
- Establish enterprise governance standards for observability and security telemetry.
- Support petabyte-scale observability and security analytics workloads.
Ideal Background
Candidates from organizations utilizing large-scale observability environments such as healthcare, banking, retail, telecommunications, logistics, cloud providers, or managed services organizations are highly desirable. Experience supporting environments generating 100TB+ of telemetry per day and integrating Splunk, Databricks, Cribl, OpenTelemetry, and cloud-native data platforms is strongly preferred.