Browse IT Jobs | IT Job Board

CoreWeave Sunnyvale, California

Apply for the Senior Software Engineer II, Inference role at CoreWeave. CoreWeave is The Essential Cloud for AI . Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. What You'll Do Senior engineers are area owners who lead designs, raise engineering standards, and deliver measurable improvements to latency, throughput, and reliability across multiple services. You'll partner with product, orchestration, and hardware teams to evolve our Kubernetes-native inference platform and meet strict P99 SLAs at scale. About The Role Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones. Define and own SLIs/SLOs; ensure post-incident actions land and reliability improves release-over-release. Implement advanced optimizations (e.g., micro-batch schedulers, speculative decoding, KV-cache reuse) and quantify impact. Strengthen incident posture: capacity planning, autoscaling policy, graceful degradation, rollback/traffic-shift strategies. Mentor IC1/IC2 engineers; review cross-team designs and elevate coding/testing standards. Own an area spanning multiple services and teams (e.g., request routing & adaptive scheduling, cost-per-token analytics, GPU resource isolation). Qualifications 5-8 years industry experience building distributed systems or cloud services. Strong coding in Python or Go (C++ a plus) and deep familiarity with networked systems and performance. Hands on experience with Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry). Practical knowledge of inference internals: batching, caching, mixed precision (BF16/FP8), streaming token delivery. Proven track record improving tail latency (P95/P99) and service reliability through metrics driven work. Preferred Contributions to inference frameworks (vLLM, Triton, TensorRT LLM, Ray Serve, TorchServe). Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies. Leading multi team initiatives or partnering with customers on mission critical launches. Wondering if you're a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams - even if you aren't a 100% skill or experience match. Why CoreWeave? At CoreWeave, we work hard, have fun, and move fast! We're in an exciting stage of hyper growth that you will not want to miss out on. We're not afraid of a little chaos, and we're constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values: Be Curious at Your Core Act Like an Owner Empower Employees Deliver Best in Class Client Experiences Achieve More Together We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. Base Salary & Benefits Base salary range: $165,000 to $242,000. In addition to a competitive salary, we offer a discretionary bonus, equity awards, and a comprehensive benefits program. Benefits include: Medical, dental, and vision insurance - 100% paid by CoreWeave Company paid life insurance Voluntary supplemental life insurance Short and long term disability insurance Flexible Spending Account Health Savings Account Tuition reimbursement Ability to participate in Employee Stock Purchase Program (ESPP) Mental wellness benefits through Spring Health Family forming support provided by Carrot Paid parental leave Flexible, full service childcare support with Kinside 401(k) with a generous employer match Flexible PTOCatered lunch each day in our office and data center locations A casual work environment A work culture focused on innovative disruption Workplace While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. Legal Information California Consumer Privacy Act - California applicants only. CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, sexual orientation, gender identity, national origin, veteran status, or genetic information. As part of this commitment and consistent with the Americans with Disabilities Act (ADA), CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process. If reasonable accommodation is needed, please contact: . Export Control Compliance This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicants must be a U.S. person (U.S. citizen, permanent resident, refugee, asylee) or eligible to access the export controlled information without required export authorization. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.

04/02/2026

Full time

Senior Software Engineer, Compute Platform Chicago, IL or Remote

Moonlite Chicago, Illinois

Senior Software Engineer, Compute Platform Chicago, IL or Remote Moonlite delivers high-performance AI infrastructure for organizations running intensive computational research, large-scale model training, and demanding data processing workloads. We provide infrastructure deployed in our facilities or co-located in yours, delivering flexible on-demand or reserved compute that feels like an extension of your existing data center. Our team of AI infrastructure specialists combines bare-metal performance with cloud-native operational simplicity, enabling research teams and enterprises to deploy demanding AI workloads with enterprise-grade reliability and compliance. Your Role You will be instrumental in building out our GPU-accelerated compute platform that powers distributed AI training and inference, large-scale simulations, and computational research workloads. Working closely with product, your platform team members, and infrastructure specialists, you'll design and implement the compute orchestration layer that manages GPU clusters, bare-metal provisioning, and resource scheduling enabling researchers and engineers to programmatically access high-performance compute resources with cloud-like simplicity. Job Responsibilities Compute Orchestration Systems: Design and build scalable compute orchestration platforms that manage GPU clusters, bare-metal server provisioning, and resource allocation across co-located infrastructure environments. Resource Management & Scheduling: Implement intelligent workload scheduling, resource allocation, and optimization algorithms that maximize GPU utilization while maintaining performance guarantees for research and training workloads. Research Cluster Provisioning: Design and implement systems for provisioning and managing research computing environments including Kubernetes and SLURM clusters, enabling automated deployment, resource scheduling, and workload orchestration for distributed AI training and HPC workloads. GPU Platform Engineering: Develop platform capabilities for managing latest-generation NVIDIA GPU configurations (H100, H200, B200, B300), including GPU resource management, multi-tenant isolation, and integration with compute orchestration systems. Bare-Metal Lifecycle Management: Build automation and tooling for complete bare-metal server lifecycle management - from initial provisioning and configuration through ongoing operations, updates, and resource reallocation. Performance-Critical Systems: Optimize compute platform components for high-throughput and low-latency performance, ensuring research workloads achieve near-bare-metal efficiency in virtualized or containersized environments. Platform APIs & Integration: Develop robust APIs and SDKs that enable researchers to programmatically provision and manage compute resources, integrating seamlessly with existing workflows and research infrastructure. Observability & Monitoring: Implement comprehensive monitoring and telemetry systems for compute resources, providing visibility into GPU virtualization, workload performance and infrastructure health. Multi-Tenancy and Isolation: Build enterprise-grade multi-tenant compute isolation, security boundaries, and resource quotas that enable safe sharing of GPU infrastructure across teams and organizations. Requirements Experience: 5+ years in software engineering with proven experience building compute platforms, container orchestration systems, or distributed compute infrastructure for production environments. Compute Platform Engineering: Strong background in building compute orchestration, resource scheduling, or workload management systems at scale. Kubernetes & Container Orchestration: Strong familiarity with Kubernetes architecture, container orchestration concepts, and experience deploying workloads in Kubernetes environments. Understanding of pods, deployments, services, and basic Kubernetes operations. Programming Skills: Experience with Go, C/C++, Python, or Rust for performance-critical components is highly valued. Linux & Systems Programming: Strong experience with Linux in production environments, including systems for programming, performance optimization, and low-level resource management. Virtualization & Containers: Deep knowledge of virtualization technologies (KVM, Xen), container runtimes, and orchestration platforms. GPU Computing Fundamentals: Understanding of GPU architectures, CUDA programming (where/when needed), and GPU resource management - or a strong ability to learn quickly. Bare-Metal Infrastructure: Experience with bare-metal provisioning, out-of-band management systems, and hardware abstraction layers. Problem Solving & Architecture: Demonstrated ability to solve complex performance and scalability challenges while balancing pragmatic shipping with good long-term architecture. Autonomy & Communication: Comfortable navigating ambiguity, defining requirements collaboratively, and communicating technical discussions through clear documentation. Commitment to Growth: Growth mindset with continuous focus on learning and professional development. Preferred Qualifications Background provisioning or managing research computing environments (Kubernetes, SLURM, or HPC clusters). Experience with GPU virtualization technologies (SR-IOV, NVIDIA vGPU) and multi-tenant GPU sharing. Background in container orchestration platforms with custom scheduling or resource management. Knowledge of high-performance networking for GPU communication (InfiniBand, RDMA, NVLink, NVSwitch). Familiarity with AI/ML training frameworks (PyTorch, TensorFlow) and their infrastructure requirements. Understanding of distributed training patterns and multi-node GPU coordination. Experience building infrastructure for research institutions, labs, or technical computing environments. Background in financial services or other regulated industry infrastructure is a plus. Key Technologies Build Next-Generation Infrastructure: Your work will create the platform foundation that enables financial institutions to harness AI capabilities previously impossible with traditional infrastructure. Hands On Ownership: As an early engineer, you'll have end-to-end ownership of projects and the autonomy to influence our product and technology direction. Shape Industry Standards: Contribute to defining how enterprise AI infrastructure should work for the most demanding regulated environments. Collaborate with Experts: Work alongside seasoned engineers and industry professionals passionate about high-performance computing, innovation, and problem-solving. Start Up Agility with Industry Impact: Enjoy the dynamic, fast paced environment of a startup while making an immediate impact in an evolving and critical technology space. We offer a competitive total compensation package combining a competitive base salary, startup equity, and industry-leading benefits. The total compensation range for this role is $165,000 - $225,000, which includes both base salary and equity. Actual compensation will be determined based on experience, skills, and market alignment. We provide generous benefits, including a 6% 401(k) match, fully covered health insurance premiums, and other comprehensive offerings to support your well being and success as we grow together. As set forth in Moonlite's Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law. Any information that you do provide for voluntary self-identification will not be considered in the hiring process or thereafter. Information will be kept confidential.

04/02/2026

Full time

2 jobs found

Modal Window