Senior AI Infra Observability Engineer (GPU Clusters)

  • Clockwork.io
  • Palo Alto, California
  • 04/02/2026
Full time Information Technology Telecommunications

Job Description

A technology startup in Palo Alto is seeking a Senior Software Engineer to design and build scalable backend systems for AI and GPU cluster observability. The ideal candidate will have over 7 years of industry experience with a strong foundation in data structures, algorithms, and proficient in languages like C, C++, Go, Java, or Python. The role involves developing methods to detect complex infrastructure issues and collaborating across teams to ensure system reliability and performance. Competitive compensation and a great benefits package are offered.