As the first Technical Writer, you will own the entire documentation lifecycle, including API references, SDK docs, and integration guides. You will collaborate with engineers to translate complex distributed systems and inference infrastructure concepts into clear developer documentation.
uRun
6 Remote Job Openings at uRun
As the founding SRE, you will define the reliability culture, observability stack, and incident response processes from scratch. You will partner with ML infrastructure engineers to ensure the stability and scalability of the interactive AI inference cloud.
Develop custom CUDA kernels and optimize model inference to achieve sub-50ms latency and 10-100x performance gains. Own the end-to-end inference pipeline, focusing on GPU utilization, memory bandwidth, and distributed memory optimizations.
Design and scale a GPU compute platform supporting 1,000+ clusters to enable real-time, stateful AI inference. Own the full infrastructure stack from bare metal to model serving, including resource orchestration and production reliability.
Design and own the scalable, low-latency infrastructure powering the uRun real-time inference runtime. You will manage GPU-heavy workloads, streaming pipelines, and define platform standards for security and observability.
Build and maintain the backend services, APIs, and core application systems that power the uRun real-time inference runtime. Design scalable systems for real-time interaction and session state while shaping the overall architecture for fault tolerance and performance.