Jobs at Crusoe | IT Job Board

Crusoe San Francisco, California

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role: As a Staff Cloud Support Engineer, you are a technical authority within Crusoe Cloud and a force multiplier across Customer Experience, SRE, Networking, Fleet, and Product teams. You operate beyond ticket resolution. You design reliability guardrails, influence architecture decisions, mentor engineers, and directly protect revenue by preventing large-scale incidents. You bring deep expertise in Linux systems, Kubernetes, networking, and AI/ML infrastructure, and apply that knowledge with strong customer focus. You are comfortable operating in ambiguity, leading incident response, and shaping how Crusoe scales high-performance AI infrastructure globally. What You'll Be Working On Technical Leadership & Escalations Serve as highest-level escalation point for complex P1/P0 incidents. Lead cross-functional root cause investigations involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers. Partner with SRE, Software teams (Storage, Networking, Compute, K8) to design systemic fixes rather than recurring workarounds. Reliability Architecture Design and improve node validation, burn-in processes, performance baselining, and release readiness. Influence Kubernetes architecture, workload orchestration (Slurm, Terraform), and AI/ML cluster stability. Reduce MTTR and incident recurrence through structural improvements. AI/ML Infrastructure Expertise Troubleshoot NCCL, IB, GPU driver/firmware issues, distributed training failures. Support complex AI workloads (training + inference) with performance tuning and observability improvements. Customer-Facing Authority Act as technical advisor during high-risk customer incidents. Deliver executive-ready RCAs with clarity and confidence. Drive trust through transparency and technical depth. Mentorship & Standards Mentor P3/P4 engineers. Define SOPs and technical standards for support excellence. Partner with Enablement to raise the technical bar across the organization. What You Bring to the Team: 8+ years experience in SRE, DevOps, HPC, or Cloud Infrastructure roles. Advanced Linux systems expertise. Deep Kubernetes operational experience (CKA-level or higher). Strong networking knowledge: Infiniband, RDMA, RoCE, SDN. Experience supporting AI/ML workloads at scale (GPU clusters). Proven track record of resolving multi-layer, distributed system failures. Strong customer communication and executive-facing presence. Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $156,000 - $190,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

06/08/2026

Full time

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role: As a Staff Cloud Support Engineer, you are a technical authority within Crusoe Cloud and a force multiplier across Customer Experience, SRE, Networking, Fleet, and Product teams. You operate beyond ticket resolution. You design reliability guardrails, influence architecture decisions, mentor engineers, and directly protect revenue by preventing large-scale incidents. You bring deep expertise in Linux systems, Kubernetes, networking, and AI/ML infrastructure, and apply that knowledge with strong customer focus. You are comfortable operating in ambiguity, leading incident response, and shaping how Crusoe scales high-performance AI infrastructure globally. What You'll Be Working On Technical Leadership & Escalations Serve as highest-level escalation point for complex P1/P0 incidents. Lead cross-functional root cause investigations involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers. Partner with SRE, Software teams (Storage, Networking, Compute, K8) to design systemic fixes rather than recurring workarounds. Reliability Architecture Design and improve node validation, burn-in processes, performance baselining, and release readiness. Influence Kubernetes architecture, workload orchestration (Slurm, Terraform), and AI/ML cluster stability. Reduce MTTR and incident recurrence through structural improvements. AI/ML Infrastructure Expertise Troubleshoot NCCL, IB, GPU driver/firmware issues, distributed training failures. Support complex AI workloads (training + inference) with performance tuning and observability improvements. Customer-Facing Authority Act as technical advisor during high-risk customer incidents. Deliver executive-ready RCAs with clarity and confidence. Drive trust through transparency and technical depth. Mentorship & Standards Mentor P3/P4 engineers. Define SOPs and technical standards for support excellence. Partner with Enablement to raise the technical bar across the organization. What You Bring to the Team: 8+ years experience in SRE, DevOps, HPC, or Cloud Infrastructure roles. Advanced Linux systems expertise. Deep Kubernetes operational experience (CKA-level or higher). Strong networking knowledge: Infiniband, RDMA, RoCE, SDN. Experience supporting AI/ML workloads at scale (GPU clusters). Proven track record of resolving multi-layer, distributed system failures. Strong customer communication and executive-facing presence. Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $156,000 - $190,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Staff Network Engineer, Operations

Crusoe San Francisco, California

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role: Crusoe Cloud is seeking a Staff Network Operations Engineer to help own production reliability across our global network infrastructure, including edge, backbone, data center fabric, and GPU cluster interconnects. This is a hands-on production ownership role focused on incident response, root cause analysis, and operational excellence initiatives that keep our hyperscale AI infrastructure running at scale. Your work will directly affect the availability of AI workloads running across thousands of GPUs worldwide. The ideal candidate is a seasoned network engineer with deep operational experience in large-scale environments who thrives in high-pressure situations and takes pride in keeping systems healthy. You'll contribute to defining SLIs and SLOs, improving observability tooling, building automation to reduce toil, and mentoring peers - all while serving as a key escalation point during high-severity network events. What You'll Be Working On: Production Reliability: Help own uptime across Crusoe's global edge, backbone, data center, and GPU cluster network, directly supporting AI workloads at scale. Incident Response: Lead and contribute to end-to-end response for high-severity network events, including mitigation, stakeholder communication, and postmortem documentation. Root Cause Analysis: Drive RCAs for production incidents, identify systemic issues, and author remediation plans tracked through to closure. Observability Improvements: Contribute to and improve Crusoe's network monitoring stack using streaming telemetry, SNMP, NetFlow, and tools such as Kentik, Grafana, Prometheus, and ThousandEyes. Operational Standards: Author and maintain runbooks, escalation playbooks, and SOPs used across the operations team. Operational Automation: Write Python-based tooling to reduce toil, automate common remediation workflows, and accelerate mean time to resolution. SLI/SLO Contribution: Partner with Architecture and SRE teams to define and track network reliability metrics and service level objectives backed by real-time dashboards. Mentorship: Provide technical guidance to Senior engineers and contribute to a culture of operational excellence and continuous learning. What You'll Bring to the Team: 8+ years of production network engineering experience with a focus on operations, incident response, and reliability in large-scale or internet-scale environments. Hands-on experience with observability and monitoring tools including streaming telemetry, SNMP, NetFlow/sFlow, Grafana, Prometheus, and ThousandEyes. Experience operating RDMA/RoCE lossless fabrics for GPU or HPC workloads, including familiarity with PFC, ECN, and DCQCN tuning. Expert hands-on knowledge of BGP, EVPN-VXLAN, IS-IS, OSPF, MPLS, QoS, and TCP/IP in production data center environments. Proficiency with Arista (EOS) and Juniper (Junos) platforms in leaf-spine CLOS architectures across multi-vendor environments. Python proficiency for writing auto-remediation scripts, diagnostic tooling, and operational automation. Comfort operating large device fleets across multi-region environments with on-call responsibility, including experience as an escalation point during critical events. Bachelor's degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience. Bonus Points: Experience with NVIDIA/Mellanox networking platforms in GPU cluster environments. Familiarity with Kentik or Arbor for traffic analysis and DDoS visibility. Experience defining or contributing to SLIs and SLOs in partnership with SRE or product teams. Exposure to operating 10K+ device fleets across hyperscale or cloud environments. Background contributing to post-incident learning programs or operational excellence initiatives org-wide. Benefits: Competitive compensation and equity packages Restricted Stock Units Paid time off, paid holidays & leave of absence programs Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Global travel insurance & emergency assistance Daily meals allowance Additional perks & programs specific to location Compensation Range Compensation will be paid in the range of up to $195,000 -$235,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

06/08/2026

Full time

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role: Crusoe Cloud is seeking a Staff Network Operations Engineer to help own production reliability across our global network infrastructure, including edge, backbone, data center fabric, and GPU cluster interconnects. This is a hands-on production ownership role focused on incident response, root cause analysis, and operational excellence initiatives that keep our hyperscale AI infrastructure running at scale. Your work will directly affect the availability of AI workloads running across thousands of GPUs worldwide. The ideal candidate is a seasoned network engineer with deep operational experience in large-scale environments who thrives in high-pressure situations and takes pride in keeping systems healthy. You'll contribute to defining SLIs and SLOs, improving observability tooling, building automation to reduce toil, and mentoring peers - all while serving as a key escalation point during high-severity network events. What You'll Be Working On: Production Reliability: Help own uptime across Crusoe's global edge, backbone, data center, and GPU cluster network, directly supporting AI workloads at scale. Incident Response: Lead and contribute to end-to-end response for high-severity network events, including mitigation, stakeholder communication, and postmortem documentation. Root Cause Analysis: Drive RCAs for production incidents, identify systemic issues, and author remediation plans tracked through to closure. Observability Improvements: Contribute to and improve Crusoe's network monitoring stack using streaming telemetry, SNMP, NetFlow, and tools such as Kentik, Grafana, Prometheus, and ThousandEyes. Operational Standards: Author and maintain runbooks, escalation playbooks, and SOPs used across the operations team. Operational Automation: Write Python-based tooling to reduce toil, automate common remediation workflows, and accelerate mean time to resolution. SLI/SLO Contribution: Partner with Architecture and SRE teams to define and track network reliability metrics and service level objectives backed by real-time dashboards. Mentorship: Provide technical guidance to Senior engineers and contribute to a culture of operational excellence and continuous learning. What You'll Bring to the Team: 8+ years of production network engineering experience with a focus on operations, incident response, and reliability in large-scale or internet-scale environments. Hands-on experience with observability and monitoring tools including streaming telemetry, SNMP, NetFlow/sFlow, Grafana, Prometheus, and ThousandEyes. Experience operating RDMA/RoCE lossless fabrics for GPU or HPC workloads, including familiarity with PFC, ECN, and DCQCN tuning. Expert hands-on knowledge of BGP, EVPN-VXLAN, IS-IS, OSPF, MPLS, QoS, and TCP/IP in production data center environments. Proficiency with Arista (EOS) and Juniper (Junos) platforms in leaf-spine CLOS architectures across multi-vendor environments. Python proficiency for writing auto-remediation scripts, diagnostic tooling, and operational automation. Comfort operating large device fleets across multi-region environments with on-call responsibility, including experience as an escalation point during critical events. Bachelor's degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience. Bonus Points: Experience with NVIDIA/Mellanox networking platforms in GPU cluster environments. Familiarity with Kentik or Arbor for traffic analysis and DDoS visibility. Experience defining or contributing to SLIs and SLOs in partnership with SRE or product teams. Exposure to operating 10K+ device fleets across hyperscale or cloud environments. Background contributing to post-incident learning programs or operational excellence initiatives org-wide. Benefits: Competitive compensation and equity packages Restricted Stock Units Paid time off, paid holidays & leave of absence programs Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Global travel insurance & emergency assistance Daily meals allowance Additional perks & programs specific to location Compensation Range Compensation will be paid in the range of up to $195,000 -$235,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant's knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Staff Enterprise AI Automation Engineer

Crusoe San Francisco, California

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role We're seeking a Staff Enterprise AI Automation Engineer to play a key role in executing Crusoe's 2026 Enterprise AI Strategy. In this role, you will design and build agentic AI systems that move the organization from simple information retrieval to orchestrated, multi-system automation. You'll operate at the intersection of AI, enterprise systems, and integration platforms-building scalable agent workflows, enabling a citizen developer ecosystem, and establishing the technical foundations for an AI-powered operating model. What You'll Be Working On Designing and implementing agentic AI workflows using a modular, API-first architecture across platforms such as Workato ONE, Anthropic Claude, and Gemini. Building autonomous agents that orchestrate workflows across enterprise systems (e.g., Salesforce, Coupa, Slack, Google Workspace) Architecting and integrating a unified data layer that enables AI agents to access and act on data across siloed systems Developing integrations, APIs, and custom connectors that enable scalable AI orchestration across business platforms Implementing MCP (Model Context Protocol) connectors and model-agnostic orchestration patterns Designing deployment pipelines and lifecycle management systems for AI agents in production environments Embedding security, data privacy, and compliance guardrails into all AI implementations Creating reusable templates, frameworks, and tooling to support a Citizen Developer Program Mentoring internal teams through code reviews, training, and technical enablement programs Evaluating emerging AI technologies and prototyping next-generation capabilities to advance agentic maturity What You'll Bring to the Team 10+ years of software engineering experience, including 3+ years in AI/ML or AI application development Strong proficiency in Python and API development (REST, GraphQL, webhooks) Hands-on experience with enterprise integration platforms (e.g., Workato, MuleSoft, Zapier) Experience working with LLM APIs (OpenAI, Anthropic, Google Gemini, or similar) Deep understanding of agentic architectures, RAG patterns, and prompt engineering Experience designing scalable, distributed systems in cloud environments (AWS, GCP, or Azure) Strong knowledge of microservices, event-driven architecture, and integration design patterns Experience with CI/CD, infrastructure as code, and DevOps practices Understanding of data security, privacy, and compliance considerations (SOC 2, GDPR) Bonus Points Experience deploying agentic AI systems in production environments Familiarity with iPaaS platforms (Workato preferred) and enterprise automation ecosystems Experience with Google Workspace or Microsoft 365 automation and extensibility Knowledge of Model Context Protocol (MCP) or similar interoperability standards Experience implementing AI governance frameworks in enterprise settings Background in infrastructure, energy, or high-performance computing environments Contributions to open-source AI projects or technical thought leadership Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $190,000 - $230,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

06/08/2026

Full time

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role We're seeking a Staff Enterprise AI Automation Engineer to play a key role in executing Crusoe's 2026 Enterprise AI Strategy. In this role, you will design and build agentic AI systems that move the organization from simple information retrieval to orchestrated, multi-system automation. You'll operate at the intersection of AI, enterprise systems, and integration platforms-building scalable agent workflows, enabling a citizen developer ecosystem, and establishing the technical foundations for an AI-powered operating model. What You'll Be Working On Designing and implementing agentic AI workflows using a modular, API-first architecture across platforms such as Workato ONE, Anthropic Claude, and Gemini. Building autonomous agents that orchestrate workflows across enterprise systems (e.g., Salesforce, Coupa, Slack, Google Workspace) Architecting and integrating a unified data layer that enables AI agents to access and act on data across siloed systems Developing integrations, APIs, and custom connectors that enable scalable AI orchestration across business platforms Implementing MCP (Model Context Protocol) connectors and model-agnostic orchestration patterns Designing deployment pipelines and lifecycle management systems for AI agents in production environments Embedding security, data privacy, and compliance guardrails into all AI implementations Creating reusable templates, frameworks, and tooling to support a Citizen Developer Program Mentoring internal teams through code reviews, training, and technical enablement programs Evaluating emerging AI technologies and prototyping next-generation capabilities to advance agentic maturity What You'll Bring to the Team 10+ years of software engineering experience, including 3+ years in AI/ML or AI application development Strong proficiency in Python and API development (REST, GraphQL, webhooks) Hands-on experience with enterprise integration platforms (e.g., Workato, MuleSoft, Zapier) Experience working with LLM APIs (OpenAI, Anthropic, Google Gemini, or similar) Deep understanding of agentic architectures, RAG patterns, and prompt engineering Experience designing scalable, distributed systems in cloud environments (AWS, GCP, or Azure) Strong knowledge of microservices, event-driven architecture, and integration design patterns Experience with CI/CD, infrastructure as code, and DevOps practices Understanding of data security, privacy, and compliance considerations (SOC 2, GDPR) Bonus Points Experience deploying agentic AI systems in production environments Familiarity with iPaaS platforms (Workato preferred) and enterprise automation ecosystems Experience with Google Workspace or Microsoft 365 automation and extensibility Knowledge of Model Context Protocol (MCP) or similar interoperability standards Experience implementing AI governance frameworks in enterprise settings Background in infrastructure, energy, or high-performance computing environments Contributions to open-source AI projects or technical thought leadership Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $190,000 - $230,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Staff Cloud Support Engineer

Crusoe Sunnyvale, California

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role: As a Staff Cloud Support Engineer, you are a technical authority within Crusoe Cloud and a force multiplier across Customer Experience, SRE, Networking, Fleet, and Product teams. You operate beyond ticket resolution. You design reliability guardrails, influence architecture decisions, mentor engineers, and directly protect revenue by preventing large-scale incidents. You bring deep expertise in Linux systems, Kubernetes, networking, and AI/ML infrastructure, and apply that knowledge with strong customer focus. You are comfortable operating in ambiguity, leading incident response, and shaping how Crusoe scales high-performance AI infrastructure globally. What You'll Be Working On Technical Leadership & Escalations Serve as highest-level escalation point for complex P1/P0 incidents. Lead cross-functional root cause investigations involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers. Partner with SRE, Software teams (Storage, Networking, Compute, K8) to design systemic fixes rather than recurring workarounds. Reliability Architecture Design and improve node validation, burn-in processes, performance baselining, and release readiness. Influence Kubernetes architecture, workload orchestration (Slurm, Terraform), and AI/ML cluster stability. Reduce MTTR and incident recurrence through structural improvements. AI/ML Infrastructure Expertise Troubleshoot NCCL, IB, GPU driver/firmware issues, distributed training failures. Support complex AI workloads (training + inference) with performance tuning and observability improvements. Customer-Facing Authority Act as technical advisor during high-risk customer incidents. Deliver executive-ready RCAs with clarity and confidence. Drive trust through transparency and technical depth. Mentorship & Standards Mentor P3/P4 engineers. Define SOPs and technical standards for support excellence. Partner with Enablement to raise the technical bar across the organization. What You Bring to the Team: 8+ years experience in SRE, DevOps, HPC, or Cloud Infrastructure roles. Advanced Linux systems expertise. Deep Kubernetes operational experience (CKA-level or higher). Strong networking knowledge: Infiniband, RDMA, RoCE, SDN. Experience supporting AI/ML workloads at scale (GPU clusters). Proven track record of resolving multi-layer, distributed system failures. Strong customer communication and executive-facing presence. Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $156,000 - $190,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

06/08/2026

Full time

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role: As a Staff Cloud Support Engineer, you are a technical authority within Crusoe Cloud and a force multiplier across Customer Experience, SRE, Networking, Fleet, and Product teams. You operate beyond ticket resolution. You design reliability guardrails, influence architecture decisions, mentor engineers, and directly protect revenue by preventing large-scale incidents. You bring deep expertise in Linux systems, Kubernetes, networking, and AI/ML infrastructure, and apply that knowledge with strong customer focus. You are comfortable operating in ambiguity, leading incident response, and shaping how Crusoe scales high-performance AI infrastructure globally. What You'll Be Working On Technical Leadership & Escalations Serve as highest-level escalation point for complex P1/P0 incidents. Lead cross-functional root cause investigations involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers. Partner with SRE, Software teams (Storage, Networking, Compute, K8) to design systemic fixes rather than recurring workarounds. Reliability Architecture Design and improve node validation, burn-in processes, performance baselining, and release readiness. Influence Kubernetes architecture, workload orchestration (Slurm, Terraform), and AI/ML cluster stability. Reduce MTTR and incident recurrence through structural improvements. AI/ML Infrastructure Expertise Troubleshoot NCCL, IB, GPU driver/firmware issues, distributed training failures. Support complex AI workloads (training + inference) with performance tuning and observability improvements. Customer-Facing Authority Act as technical advisor during high-risk customer incidents. Deliver executive-ready RCAs with clarity and confidence. Drive trust through transparency and technical depth. Mentorship & Standards Mentor P3/P4 engineers. Define SOPs and technical standards for support excellence. Partner with Enablement to raise the technical bar across the organization. What You Bring to the Team: 8+ years experience in SRE, DevOps, HPC, or Cloud Infrastructure roles. Advanced Linux systems expertise. Deep Kubernetes operational experience (CKA-level or higher). Strong networking knowledge: Infiniband, RDMA, RoCE, SDN. Experience supporting AI/ML workloads at scale (GPU clusters). Proven track record of resolving multi-layer, distributed system failures. Strong customer communication and executive-facing presence. Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $156,000 - $190,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Staff Cloud Support Engineer

Crusoe San Jose, California

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role: As a Staff Cloud Support Engineer, you are a technical authority within Crusoe Cloud and a force multiplier across Customer Experience, SRE, Networking, Fleet, and Product teams. You operate beyond ticket resolution. You design reliability guardrails, influence architecture decisions, mentor engineers, and directly protect revenue by preventing large-scale incidents. You bring deep expertise in Linux systems, Kubernetes, networking, and AI/ML infrastructure, and apply that knowledge with strong customer focus. You are comfortable operating in ambiguity, leading incident response, and shaping how Crusoe scales high-performance AI infrastructure globally. What You'll Be Working On Technical Leadership & Escalations Serve as highest-level escalation point for complex P1/P0 incidents. Lead cross-functional root cause investigations involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers. Partner with SRE, Software teams (Storage, Networking, Compute, K8) to design systemic fixes rather than recurring workarounds. Reliability Architecture Design and improve node validation, burn-in processes, performance baselining, and release readiness. Influence Kubernetes architecture, workload orchestration (Slurm, Terraform), and AI/ML cluster stability. Reduce MTTR and incident recurrence through structural improvements. AI/ML Infrastructure Expertise Troubleshoot NCCL, IB, GPU driver/firmware issues, distributed training failures. Support complex AI workloads (training + inference) with performance tuning and observability improvements. Customer-Facing Authority Act as technical advisor during high-risk customer incidents. Deliver executive-ready RCAs with clarity and confidence. Drive trust through transparency and technical depth. Mentorship & Standards Mentor P3/P4 engineers. Define SOPs and technical standards for support excellence. Partner with Enablement to raise the technical bar across the organization. What You Bring to the Team: 8+ years experience in SRE, DevOps, HPC, or Cloud Infrastructure roles. Advanced Linux systems expertise. Deep Kubernetes operational experience (CKA-level or higher). Strong networking knowledge: Infiniband, RDMA, RoCE, SDN. Experience supporting AI/ML workloads at scale (GPU clusters). Proven track record of resolving multi-layer, distributed system failures. Strong customer communication and executive-facing presence. Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $156,000 - $190,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

06/08/2026

Full time

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role: As a Staff Cloud Support Engineer, you are a technical authority within Crusoe Cloud and a force multiplier across Customer Experience, SRE, Networking, Fleet, and Product teams. You operate beyond ticket resolution. You design reliability guardrails, influence architecture decisions, mentor engineers, and directly protect revenue by preventing large-scale incidents. You bring deep expertise in Linux systems, Kubernetes, networking, and AI/ML infrastructure, and apply that knowledge with strong customer focus. You are comfortable operating in ambiguity, leading incident response, and shaping how Crusoe scales high-performance AI infrastructure globally. What You'll Be Working On Technical Leadership & Escalations Serve as highest-level escalation point for complex P1/P0 incidents. Lead cross-functional root cause investigations involving compute, networking (IB/RDMA/RoCE), storage, and orchestration layers. Partner with SRE, Software teams (Storage, Networking, Compute, K8) to design systemic fixes rather than recurring workarounds. Reliability Architecture Design and improve node validation, burn-in processes, performance baselining, and release readiness. Influence Kubernetes architecture, workload orchestration (Slurm, Terraform), and AI/ML cluster stability. Reduce MTTR and incident recurrence through structural improvements. AI/ML Infrastructure Expertise Troubleshoot NCCL, IB, GPU driver/firmware issues, distributed training failures. Support complex AI workloads (training + inference) with performance tuning and observability improvements. Customer-Facing Authority Act as technical advisor during high-risk customer incidents. Deliver executive-ready RCAs with clarity and confidence. Drive trust through transparency and technical depth. Mentorship & Standards Mentor P3/P4 engineers. Define SOPs and technical standards for support excellence. Partner with Enablement to raise the technical bar across the organization. What You Bring to the Team: 8+ years experience in SRE, DevOps, HPC, or Cloud Infrastructure roles. Advanced Linux systems expertise. Deep Kubernetes operational experience (CKA-level or higher). Strong networking knowledge: Infiniband, RDMA, RoCE, SDN. Experience supporting AI/ML workloads at scale (GPU clusters). Proven track record of resolving multi-layer, distributed system failures. Strong customer communication and executive-facing presence. Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $156,000 - $190,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Staff Storage Engineer

Crusoe San Francisco, California

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role: At Crusoe, we are on a mission to align the future of computing with the future of the climate. As a Staff Storage Engineer on the Storage Team, you will be the lead architect and operator of the data layer for our vertically integrated AI cloud. This team sits at the critical intersection of massive-scale data ingress/egress and high-performance GPU workloads, ensuring that our sustainable clusters deliver world-class data throughput for the world's most demanding AI and HPC use cases. You will manage the end-to-end lifecycle of our world-wide storage environment from initial bring-up and configuration to high-level vendor strategy. In this role, you will have a direct hand in shaping our enterprise infrastructure, collaborating on vendor RFPs and reviewing responses while working directly to influence vendor product roadmaps. Your work ensures that Fortune 500 companies and leading AI researchers have the performant, reliable, and sustainable storage needed to power the AI revolution. What You'll Be Working On: Performance Analysis & Optimization: Evaluate performance of block, file, and object storage systems across diverse workloads. Identify bottlenecks at the hardware, firmware, OS, and application layers. Develop and execute performance test plans, benchmarks, and stress tests. Tune storage stacks (I/O schedulers, caching layers, drivers, protocols) to achieve target KPIs. Validation & Testing: Design and execute Proof of Concept (PoC) exercises to take new arrays through their paces. You will validate new vendor software releases in staging environments before rolling them out to our global production footprint. Full-Stack Administration: Own the initial bring-up, configuration, and ongoing performance tuning of large enterprise arrays. You will manage the lifecycle of the storage OS, ensuring all systems are optimized for AI training and inference I/O patterns. Enterprise Infrastructure Building: Collaborate with the Compute and Networking teams to build a seamless "gold standard" cloud infrastructure. You will design cloud-scale storage systems that can excel in high-concurrency, high-throughput environments. Storage Strategy & Selection: Lead the technical evaluation of new storage technologies. You will be responsible for authoring RFPs, reviewing vendor responses, and leading "down selection" processes to ensure we invest in the best hardware for AI workloads. Vendor Roadmap Influence: Serve as the primary technical point of contact for storage partners (such as VAST Data, Pure Storage). You will sit with their engineering teams to provide feedback on bugs, missing features, and prioritize Crusoe's requirements on their development roadmaps. Cross Functional Collaboration: Work closely with service engineering and architecture teams to influence design decisions. Provide performance guidance during feature development and release cycles. Communicate findings to both technical and non technical stakeholders. What You'll Bring to the Team: 10+ years of experience in storage systems administration with a heavy focus on petabyte-scale, on-premise data environments. Strong understanding of storage architectures (block, file, object) and I/O paths. Hands on experience with performance benchmarking and observability tools (FIO, ElBencho, blktrace, nvme-cli,nfs-gaze, eBPF, etc.). Experience with SSDs, NVMe, RAID, caching, or distributed storage systems. Deep familiarity with enterprise flash arrays and distributed file systems. Specific experience with VAST Data, Pure Storage (Everpure) is highly preferred. Proficiency with scripting (Python, Go or bash) to automate array management and monitoring. Ability to analyze complex performance data and present clear conclusions. Proven ability to lead the authoring of technical requirements, evaluating RFP responses and managing complex vendor relationships. Experience with system design for specific I/O use cases (AI training/inference) and a disciplined approach to testing and validating new vendor releases. A genuine interest in Crusoe's mission to reduce the environmental impact of the AI revolution through sustainable infrastructure. Bonus Points Experience with RDMA, iSCSI, NVME-oF, RoCEv2 or InfiniBand networking as it relates to high-performance storage. Previous experience at a major Cloud Service Provider (CSP) or a high-scale AI infrastructure company. Familiarity with distributed storage systems (Ceph, Lustre, Gluster, etc.). Benefits: Competitive compensation and equity packages Restricted Stock Units Paid time off, paid holidays & leave of absence programs Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Global travel insurance & emergency assistance Daily meals allowance Additional perks & programs specific to location Compensation Range: $180,000 - $225,000 + Bonus. Restricted Stock Units are included in all offers. Compensation is determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

06/08/2026

Full time

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About the Role: At Crusoe, we are on a mission to align the future of computing with the future of the climate. As a Staff Storage Engineer on the Storage Team, you will be the lead architect and operator of the data layer for our vertically integrated AI cloud. This team sits at the critical intersection of massive-scale data ingress/egress and high-performance GPU workloads, ensuring that our sustainable clusters deliver world-class data throughput for the world's most demanding AI and HPC use cases. You will manage the end-to-end lifecycle of our world-wide storage environment from initial bring-up and configuration to high-level vendor strategy. In this role, you will have a direct hand in shaping our enterprise infrastructure, collaborating on vendor RFPs and reviewing responses while working directly to influence vendor product roadmaps. Your work ensures that Fortune 500 companies and leading AI researchers have the performant, reliable, and sustainable storage needed to power the AI revolution. What You'll Be Working On: Performance Analysis & Optimization: Evaluate performance of block, file, and object storage systems across diverse workloads. Identify bottlenecks at the hardware, firmware, OS, and application layers. Develop and execute performance test plans, benchmarks, and stress tests. Tune storage stacks (I/O schedulers, caching layers, drivers, protocols) to achieve target KPIs. Validation & Testing: Design and execute Proof of Concept (PoC) exercises to take new arrays through their paces. You will validate new vendor software releases in staging environments before rolling them out to our global production footprint. Full-Stack Administration: Own the initial bring-up, configuration, and ongoing performance tuning of large enterprise arrays. You will manage the lifecycle of the storage OS, ensuring all systems are optimized for AI training and inference I/O patterns. Enterprise Infrastructure Building: Collaborate with the Compute and Networking teams to build a seamless "gold standard" cloud infrastructure. You will design cloud-scale storage systems that can excel in high-concurrency, high-throughput environments. Storage Strategy & Selection: Lead the technical evaluation of new storage technologies. You will be responsible for authoring RFPs, reviewing vendor responses, and leading "down selection" processes to ensure we invest in the best hardware for AI workloads. Vendor Roadmap Influence: Serve as the primary technical point of contact for storage partners (such as VAST Data, Pure Storage). You will sit with their engineering teams to provide feedback on bugs, missing features, and prioritize Crusoe's requirements on their development roadmaps. Cross Functional Collaboration: Work closely with service engineering and architecture teams to influence design decisions. Provide performance guidance during feature development and release cycles. Communicate findings to both technical and non technical stakeholders. What You'll Bring to the Team: 10+ years of experience in storage systems administration with a heavy focus on petabyte-scale, on-premise data environments. Strong understanding of storage architectures (block, file, object) and I/O paths. Hands on experience with performance benchmarking and observability tools (FIO, ElBencho, blktrace, nvme-cli,nfs-gaze, eBPF, etc.). Experience with SSDs, NVMe, RAID, caching, or distributed storage systems. Deep familiarity with enterprise flash arrays and distributed file systems. Specific experience with VAST Data, Pure Storage (Everpure) is highly preferred. Proficiency with scripting (Python, Go or bash) to automate array management and monitoring. Ability to analyze complex performance data and present clear conclusions. Proven ability to lead the authoring of technical requirements, evaluating RFP responses and managing complex vendor relationships. Experience with system design for specific I/O use cases (AI training/inference) and a disciplined approach to testing and validating new vendor releases. A genuine interest in Crusoe's mission to reduce the environmental impact of the AI revolution through sustainable infrastructure. Bonus Points Experience with RDMA, iSCSI, NVME-oF, RoCEv2 or InfiniBand networking as it relates to high-performance storage. Previous experience at a major Cloud Service Provider (CSP) or a high-scale AI infrastructure company. Familiarity with distributed storage systems (Ceph, Lustre, Gluster, etc.). Benefits: Competitive compensation and equity packages Restricted Stock Units Paid time off, paid holidays & leave of absence programs Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Global travel insurance & emergency assistance Daily meals allowance Additional perks & programs specific to location Compensation Range: $180,000 - $225,000 + Bonus. Restricted Stock Units are included in all offers. Compensation is determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Staff Product Security Engineer

Crusoe San Francisco, California

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role We're seeking a Staff Product Security Engineer with deep AI/ML security expertise to strengthen Crusoe's security posture across applications, infrastructure, and distributed AI systems. This is a highly technical role focused on advanced penetration testing, AI/ML attack surface research, and building secure-by-design guardrails that engineering teams rely on. You'll operate at the intersection of offensive security, AI systems, and production engineering; owning security outcomes end-to-end while influencing system design across the organization. What You'll Be Working On Performing advanced manual penetration testing across complex applications, infrastructure, Kubernetes environments, and distributed microservice ecosystems Leading offensive security initiatives including red team operations, adversary simulation, and security research Securing AI/ML systems end-to-end, including LLM pipelines, vector databases, RAG architectures, and agentic workflows Identifying and researching novel attack surfaces unique to LLMs and autonomous systems, contributing to internal and external AI security research Influencing secure system design across the SDLC, embedding security into CI/CD pipelines, container images, and deployment workflows Integrating and operationalizing security tooling (SAST, DAST, SCA, container scanning) and driving remediation of complex application-layer vulnerabilities Building internal security guardrails such as hardened base images, reusable libraries, and policy-as-code frameworks Developing production-grade security tooling and leading cross-functional security programs from design through deployment What You'll Bring to the Team 8-10 years of deep hands-on experience in offensive security, including manual penetration testing, red team operations, and adversary simulation Familiarity with modern C2 frameworks (e.g., Cobalt Strike, Sliver, Havoc), exploit development, and security research Strong expertise across the AI/ML stack, including MLOps, inference architectures, vector databases, RAG, and agentic frameworks (e.g., ReAct, Reflexion) Experience building, deploying, and securing LLM pipelines and AI workflows in Kubernetes and/or bare-metal environments Strong software engineering foundations with experience shipping production code in Go, Python, or Rust Hands-on experience securing Kubernetes, containers, VMs, and CI/CD environments Deep understanding of application security vulnerabilities, secure coding practices, and distributed system design Demonstrated ability to lead complex, cross-functional security initiatives end-to-end Strong communication skills with the ability to influence both engineering teams and executive stakeholders Bonus Points Public contributions to offensive security or AI security research (talks, blogs, tooling, CVEs, etc.) Experience building internal red team or adversary simulation programs Background in high-performance computing, AI infrastructure, or cloud-native platform security Experience designing policy-as-code frameworks at scale Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $250,000 - $285,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

06/08/2026

Full time

Job DescriptionJob Description Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack - from electrons to tokens - to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that - with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved - people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role We're seeking a Staff Product Security Engineer with deep AI/ML security expertise to strengthen Crusoe's security posture across applications, infrastructure, and distributed AI systems. This is a highly technical role focused on advanced penetration testing, AI/ML attack surface research, and building secure-by-design guardrails that engineering teams rely on. You'll operate at the intersection of offensive security, AI systems, and production engineering; owning security outcomes end-to-end while influencing system design across the organization. What You'll Be Working On Performing advanced manual penetration testing across complex applications, infrastructure, Kubernetes environments, and distributed microservice ecosystems Leading offensive security initiatives including red team operations, adversary simulation, and security research Securing AI/ML systems end-to-end, including LLM pipelines, vector databases, RAG architectures, and agentic workflows Identifying and researching novel attack surfaces unique to LLMs and autonomous systems, contributing to internal and external AI security research Influencing secure system design across the SDLC, embedding security into CI/CD pipelines, container images, and deployment workflows Integrating and operationalizing security tooling (SAST, DAST, SCA, container scanning) and driving remediation of complex application-layer vulnerabilities Building internal security guardrails such as hardened base images, reusable libraries, and policy-as-code frameworks Developing production-grade security tooling and leading cross-functional security programs from design through deployment What You'll Bring to the Team 8-10 years of deep hands-on experience in offensive security, including manual penetration testing, red team operations, and adversary simulation Familiarity with modern C2 frameworks (e.g., Cobalt Strike, Sliver, Havoc), exploit development, and security research Strong expertise across the AI/ML stack, including MLOps, inference architectures, vector databases, RAG, and agentic frameworks (e.g., ReAct, Reflexion) Experience building, deploying, and securing LLM pipelines and AI workflows in Kubernetes and/or bare-metal environments Strong software engineering foundations with experience shipping production code in Go, Python, or Rust Hands-on experience securing Kubernetes, containers, VMs, and CI/CD environments Deep understanding of application security vulnerabilities, secure coding practices, and distributed system design Demonstrated ability to lead complex, cross-functional security initiatives end-to-end Strong communication skills with the ability to influence both engineering teams and executive stakeholders Bonus Points Public contributions to offensive security or AI security research (talks, blogs, tooling, CVEs, etc.) Experience building internal red team or adversary simulation programs Background in high-performance computing, AI infrastructure, or cloud-native platform security Experience designing policy-as-code frameworks at scale Benefits: Competitive compensation Restricted Stock Units Paid time off & paid holidays Comprehensive health, dental & vision insurance Employer contributions to HSA account Paid parental leave Paid life insurance, short-term and long-term disability Professional development & tuition reimbursement Mental health & wellness support Commuter benefits (parking & transit) Cell phone stipend 401(k) Retirement plan with company match up to 4% of salary Volunteer time off Compensation Range Compensation will be paid in the range of up to $250,000 - $285,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Crusoe

7 job(s) at Crusoe

Modal Window