Senior SRE and DevOps leader with 10+ years building large-scale, observable systems and leading teams to operate multi-cloud production platforms. Experienced designing and operating monitoring stacks (Prometheus, Grafana, Datadog, OpenTelemetry) and instrumentation that surface petabytes of logs and billions of time series for rapid incident diagnosis. Proven track record managing infrastructure across hundreds of cloud accounts and regions, driving SLO-driven reliability, automated runbooks, and platform-level observability patterns. Strong software background in Go, Python, Java, and C++ with hands-on infrastructure-as-code and Kubernetes experience. I focus on building opinionated, scalable observability tools and mentoring engineers to elevate reliability practices across organizations.
Member Since
April 13, 2026
Last Active
2 months ago