Thinking Machines Lab Inc.
San Francisco, California
Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and Character.ai, open weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. About the Role We're looking for an engineer to join us and contribute to data infrastructure. You'll join a small, high impact team responsible for architecting and scaling the core infrastructure behind distributed training pipelines, multimodal data catalogs, and intelligent processing systems that operate over petabytes of data. Infrastructure is critical to us: it's the bedrock that enables every breakthrough. You'll work directly with researchers to accelerate experiments, develop new datasets, improve infrastructure efficiency, and enable key insights across our data assets. If you're excited by distributed systems, large scale data mining, open source tools like Spark, Kafka, Beam, Ray, and Delta Lake, and enjoy building from the ground up, we'd love to hear from you. Note: This is an "evergreen role" that we keep open on an ongoing basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every six months. What You'll Do Design, build, and operate scalable, fault tolerant infrastructure for LLM Research: distributed compute, data orchestration, and storage across modalities. Develop high throughput systems for data ingestion, processing, and transformation - including training data catalogs, deduplication, quality checks, and search. Build systems for traceability, reproducibility, and robust quality control at every stage of the data lifecycle. Implement and maintain monitoring and alerting to support platform reliability and performance. Collaborate with research teams to unlock new features, improve data quality, and accelerate training cycles. Skills and Qualifications Bachelor's degree or equivalent experience in computer science, engineering, or similar. Proficiency in at least one backend language (we use Python or Rust). Fluent in distributed compute frameworks such as Apache Spark or Ray. Deeply familiar with cloud infrastructure, data lake architectures, and batch and streaming pipelines. Comfort operating across the stack and owning projects end to end. Thrives in a highly collaborative environment involving many, different cross functional partners and subject matter experts. Has a bias for action with a mindset to take initiative across different stacks and teams where you spot an opportunity to ship something. Preferred qualifications - we encourage you to apply if you meet some but not all of these: Have hands on experience with Kafka, dbt, Terraform, and Airflow. Have experience building a web crawler. Have extensive experience understanding and scaling deduplication, data mining, and search. Have strong knowledge of file formats and storage systems (e.g., Parquet, Delta Lake, etc.) and how they impact performance and scalability. Are proactive about documentation, testing, and empowering your teammates with good tooling. Logistics Location: This role is based in San Francisco, California. Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD. Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together. Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed. As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
04/02/2026
Full time
Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and Character.ai, open weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. About the Role We're looking for an engineer to join us and contribute to data infrastructure. You'll join a small, high impact team responsible for architecting and scaling the core infrastructure behind distributed training pipelines, multimodal data catalogs, and intelligent processing systems that operate over petabytes of data. Infrastructure is critical to us: it's the bedrock that enables every breakthrough. You'll work directly with researchers to accelerate experiments, develop new datasets, improve infrastructure efficiency, and enable key insights across our data assets. If you're excited by distributed systems, large scale data mining, open source tools like Spark, Kafka, Beam, Ray, and Delta Lake, and enjoy building from the ground up, we'd love to hear from you. Note: This is an "evergreen role" that we keep open on an ongoing basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every six months. What You'll Do Design, build, and operate scalable, fault tolerant infrastructure for LLM Research: distributed compute, data orchestration, and storage across modalities. Develop high throughput systems for data ingestion, processing, and transformation - including training data catalogs, deduplication, quality checks, and search. Build systems for traceability, reproducibility, and robust quality control at every stage of the data lifecycle. Implement and maintain monitoring and alerting to support platform reliability and performance. Collaborate with research teams to unlock new features, improve data quality, and accelerate training cycles. Skills and Qualifications Bachelor's degree or equivalent experience in computer science, engineering, or similar. Proficiency in at least one backend language (we use Python or Rust). Fluent in distributed compute frameworks such as Apache Spark or Ray. Deeply familiar with cloud infrastructure, data lake architectures, and batch and streaming pipelines. Comfort operating across the stack and owning projects end to end. Thrives in a highly collaborative environment involving many, different cross functional partners and subject matter experts. Has a bias for action with a mindset to take initiative across different stacks and teams where you spot an opportunity to ship something. Preferred qualifications - we encourage you to apply if you meet some but not all of these: Have hands on experience with Kafka, dbt, Terraform, and Airflow. Have experience building a web crawler. Have extensive experience understanding and scaling deduplication, data mining, and search. Have strong knowledge of file formats and storage systems (e.g., Parquet, Delta Lake, etc.) and how they impact performance and scalability. Are proactive about documentation, testing, and empowering your teammates with good tooling. Logistics Location: This role is based in San Francisco, California. Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD. Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together. Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed. As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
Thinking Machines Lab Inc.
San Francisco, California
A leading AI research organization seeks an Infrastructure Research Engineer in San Francisco to optimize and scale systems powering large AI models. This role emphasizes enhancing inference speed, reliability, and cost-effectiveness. Ideal candidates possess a Bachelor's in CS/Engineering, experience with deep learning frameworks, and collaborative skills in diverse teams. Competitive compensation between $350,000 and $475,000 USD is offered along with generous benefits including unlimited PTO and visa sponsorship.
04/02/2026
Full time
A leading AI research organization seeks an Infrastructure Research Engineer in San Francisco to optimize and scale systems powering large AI models. This role emphasizes enhancing inference speed, reliability, and cost-effectiveness. Ideal candidates possess a Bachelor's in CS/Engineering, experience with deep learning frameworks, and collaborative skills in diverse teams. Competitive compensation between $350,000 and $475,000 USD is offered along with generous benefits including unlimited PTO and visa sponsorship.
Thinking Machines Lab Inc.
San Francisco, California
Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and Character.ai, open weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. About the Role We're looking for an engineer to design, build, and operate the GPU supercomputing environment that powers large scale training and inference. You will deliver high performant, reliable, and cost efficient compute so our users and researchers can move fast at scale. Note: This is an "evergreen role" that we keep open on an on going basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every 6 months. You may also find that we put up postings for singular roles for separate, project or team specific needs. In those cases, you're welcome to apply directly in addition to an evergreen role. What You'll Do Operate and automate large GPU clusters including provisioning, imaging, and capacity planning. Write software that abstracts cluster management and presents a unified interface for training and inference. Extend scheduling/orchestration (Kubernetes, Slurm, or similar) for topology aware placement, preemption, quotas, and fair share multi tenancy. Monitor and improve operational metrics of speed, reliability, and error recovery. Build reliable storage and artifact paths for datasets, checkpoints, and logs with clear retention and lineage. Partner with researchers to unblock scale runs and advise on parallelism and performance trade offs. Skills and Qualifications Bachelor's degree or equivalent experience in computer science, engineering, or similar. Proficiency in at least one backend language (we use Python or Rust). Experience operating large scale clusters and container orchestration systems (e.g. Kubernetes or Slurm). Comfort operating across the stack and owning projects end to end. Thrive in a highly collaborative environment involving many, different cross functional partners and subject matter experts. A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships. Preferred qualifications - we encourage you to apply if you meet some but not all of these: Strong systems background: Linux, networking, and infrastructure as code. Familiarity with CUDA/NCCL and performance profiling for distributed training/inference. Prior work supporting large scale model training or inference environments. Understanding of deep learning frameworks (e.g., PyTorch, TensorFlow, JAX) and their underlying system architectures. Track record of working in fast paced environments balancing care with urgency. Logistics Location: This role is based in San Francisco, California. Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD. Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed. As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
04/02/2026
Full time
Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and Character.ai, open weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. About the Role We're looking for an engineer to design, build, and operate the GPU supercomputing environment that powers large scale training and inference. You will deliver high performant, reliable, and cost efficient compute so our users and researchers can move fast at scale. Note: This is an "evergreen role" that we keep open on an on going basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every 6 months. You may also find that we put up postings for singular roles for separate, project or team specific needs. In those cases, you're welcome to apply directly in addition to an evergreen role. What You'll Do Operate and automate large GPU clusters including provisioning, imaging, and capacity planning. Write software that abstracts cluster management and presents a unified interface for training and inference. Extend scheduling/orchestration (Kubernetes, Slurm, or similar) for topology aware placement, preemption, quotas, and fair share multi tenancy. Monitor and improve operational metrics of speed, reliability, and error recovery. Build reliable storage and artifact paths for datasets, checkpoints, and logs with clear retention and lineage. Partner with researchers to unblock scale runs and advise on parallelism and performance trade offs. Skills and Qualifications Bachelor's degree or equivalent experience in computer science, engineering, or similar. Proficiency in at least one backend language (we use Python or Rust). Experience operating large scale clusters and container orchestration systems (e.g. Kubernetes or Slurm). Comfort operating across the stack and owning projects end to end. Thrive in a highly collaborative environment involving many, different cross functional partners and subject matter experts. A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships. Preferred qualifications - we encourage you to apply if you meet some but not all of these: Strong systems background: Linux, networking, and infrastructure as code. Familiarity with CUDA/NCCL and performance profiling for distributed training/inference. Prior work supporting large scale model training or inference environments. Understanding of deep learning frameworks (e.g., PyTorch, TensorFlow, JAX) and their underlying system architectures. Track record of working in fast paced environments balancing care with urgency. Logistics Location: This role is based in San Francisco, California. Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD. Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed. As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
Thinking Machines Lab Inc.
San Francisco, California
Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and Character.ai, open-weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. About the Role We're looking for an infrastructure engineer to own and evolve the security infrastructure that underpins our foundation models. In this role, you'll work across compute, storage, networking, and data platforms, making sure our systems are secure, reliable, and built to scale. You'll shape controls, architecture, and tooling so that security is part of how the platform works by default. You'll partner closely with research and product teams, enabling them to move quickly while keeping our models, data, and environments protected. Note: This is an "evergreen role" that we keep open on an on-going basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every 6 months. You may also find that we put up postings for singular roles for separate, project or team specific needs. In those cases, you're welcome to apply directly in addition to an evergreen role. What You'll Do Architect security patterns for platforms and services, including network segmentation, service-to-service authentication, RBAC, and policy enforcement in Kubernetes and cloud environments. Manage identity, access, and secrets for humans and services: workload and cross-cloud identity, least-privilege IAM, and secrets management. Build secure platforms for data ingestion, processing, and curation: classification, encryption, access controls, and safe sharing patterns across teams. Write threat models and review designs with researchers and engineers to help them ship features and experiments in a safe, scalable way. Automate security checks and build guardrails: policy-as-code, secure infrastructure baselines, validation in CI/CD, and tools that make the secure path the easiest one. Skills and Qualifications Bachelor's degree or equivalent experience in engineering, or similar. Strong background with containers and orchestration (e.g., Kubernetes) and how to secure them (namespaces, network policies, pod security, admission controls, etc.) Practical experience with Infrastructure as Code (Terraform or similar), including secure patterns for provisioning networks, IAM, and shared services. Solid understanding of cloud networking and security: VPCs, load balancers, service discovery, mTLS, firewalls, and zero-trust-style architectures. Proficiency with a systems language such as Rust and scripting in Python for building platform components and internal tools. Evidence of owning complex, production-critical systems, including debugging issues that span infra, security, and application layers. Preferred qualifications - we encourage you to apply if you meet some even if you don't meet all of these: Experience with ML infrastructure, GPU clusters, or large-scale training environments (Schedulers, job queues, shared storage, multi-tenant clusters). Background in AI labs, HPC environments, or ML-heavy organizations where both security and performance are first-class concerns. Experience profiling and tuning high-throughput systems, and an ability to reason about the cost of additional security layers. Talks, blogs, or publications on infrastructure security, distributed systems, or performance engineering. Open-source contributions to security, orchestration, observability, or infrastructure tooling. Familiarity with securing specialized hardware (GPUs, TPUs) and their integrations into training and inference pipelines. Logistics Location: This role is based in San Francisco, California. Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $200,000 - $475,000 USD. Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together. Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed. As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
04/02/2026
Full time
Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and Character.ai, open-weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. About the Role We're looking for an infrastructure engineer to own and evolve the security infrastructure that underpins our foundation models. In this role, you'll work across compute, storage, networking, and data platforms, making sure our systems are secure, reliable, and built to scale. You'll shape controls, architecture, and tooling so that security is part of how the platform works by default. You'll partner closely with research and product teams, enabling them to move quickly while keeping our models, data, and environments protected. Note: This is an "evergreen role" that we keep open on an on-going basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every 6 months. You may also find that we put up postings for singular roles for separate, project or team specific needs. In those cases, you're welcome to apply directly in addition to an evergreen role. What You'll Do Architect security patterns for platforms and services, including network segmentation, service-to-service authentication, RBAC, and policy enforcement in Kubernetes and cloud environments. Manage identity, access, and secrets for humans and services: workload and cross-cloud identity, least-privilege IAM, and secrets management. Build secure platforms for data ingestion, processing, and curation: classification, encryption, access controls, and safe sharing patterns across teams. Write threat models and review designs with researchers and engineers to help them ship features and experiments in a safe, scalable way. Automate security checks and build guardrails: policy-as-code, secure infrastructure baselines, validation in CI/CD, and tools that make the secure path the easiest one. Skills and Qualifications Bachelor's degree or equivalent experience in engineering, or similar. Strong background with containers and orchestration (e.g., Kubernetes) and how to secure them (namespaces, network policies, pod security, admission controls, etc.) Practical experience with Infrastructure as Code (Terraform or similar), including secure patterns for provisioning networks, IAM, and shared services. Solid understanding of cloud networking and security: VPCs, load balancers, service discovery, mTLS, firewalls, and zero-trust-style architectures. Proficiency with a systems language such as Rust and scripting in Python for building platform components and internal tools. Evidence of owning complex, production-critical systems, including debugging issues that span infra, security, and application layers. Preferred qualifications - we encourage you to apply if you meet some even if you don't meet all of these: Experience with ML infrastructure, GPU clusters, or large-scale training environments (Schedulers, job queues, shared storage, multi-tenant clusters). Background in AI labs, HPC environments, or ML-heavy organizations where both security and performance are first-class concerns. Experience profiling and tuning high-throughput systems, and an ability to reason about the cost of additional security layers. Talks, blogs, or publications on infrastructure security, distributed systems, or performance engineering. Open-source contributions to security, orchestration, observability, or infrastructure tooling. Familiarity with securing specialized hardware (GPUs, TPUs) and their integrations into training and inference pipelines. Logistics Location: This role is based in San Francisco, California. Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $200,000 - $475,000 USD. Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together. Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed. As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
Thinking Machines Lab Inc.
San Francisco, California
An AI research organization based in San Francisco is seeking a Research Engineer specialized in infrastructure for reinforcement learning systems. This role focuses on designing and optimizing the infrastructure that powers large-scale RL workloads, requiring strong engineering skills and experience with deep learning frameworks. The position offers a competitive salary range of $350,000 to $475,000, generous benefits, and visa sponsorship.
04/02/2026
Full time
An AI research organization based in San Francisco is seeking a Research Engineer specialized in infrastructure for reinforcement learning systems. This role focuses on designing and optimizing the infrastructure that powers large-scale RL workloads, requiring strong engineering skills and experience with deep learning frameworks. The position offers a competitive salary range of $350,000 to $475,000, generous benefits, and visa sponsorship.
Thinking Machines Lab Inc.
San Francisco, California
A forward-thinking AI company in San Francisco is looking for an Infrastructure Engineer to enhance security infrastructure vital for their AI models. In this role, you'll manage security across various platforms, ensuring systems are secure and reliable. The ideal candidate has experience with Kubernetes, Infrastructure as Code, and strong cloud networking skills. The position offers salaries ranging from $200,000 to $475,000 annually and includes generous benefits like unlimited PTO and visa sponsorship.
04/02/2026
Full time
A forward-thinking AI company in San Francisco is looking for an Infrastructure Engineer to enhance security infrastructure vital for their AI models. In this role, you'll manage security across various platforms, ensuring systems are secure and reliable. The ideal candidate has experience with Kubernetes, Infrastructure as Code, and strong cloud networking skills. The position offers salaries ranging from $200,000 to $475,000 annually and includes generous benefits like unlimited PTO and visa sponsorship.
Thinking Machines Lab Inc.
San Francisco, California
Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and Character.ai, open-weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. About the Role We're looking for a full stack engineer to build and ship products from prototype to scale and to maintain tools that accelerate research and product teams. You'll work across frontend and backend components, and contribute to reliability, observability, and security in production. Note: This is an "evergreen role" that we keep open on an on-going basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every 6 months. You may also find that we put up postings for singular roles for separate, project or team specific needs. In those cases, you're welcome to apply directly in addition to an evergreen role. What You'll Do Prototype and build new APIs and product backends in Python and Rust. Launch new products and UX with React and TypeScript where needed. Improve developer experience for local dev, deployment, testing, and iteration speed. Improve system reliability, observability, and security across production environments; participate in on call. Skills and Qualifications Bachelor's degree or equivalent experience in computer science, engineering, or similar. Proficiency in at least one backend language (we use Python or Rust). Some familiarity with ReactJS, TypeScript or mobile platforms. Comfort operating across the stack and owning projects end-to-end. Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts. A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships. Preferred qualifications - we encourage you to apply if you meet some but not all of these: Experience designing and maintaining backend APIs at scale. Experience building tooling or products for LLMs or other systems that scale to a large number of users. Ability to build high quality, production level UIs from prototype to polish. Familiarity with NodeJS, Python, and/or Rust. Experience building AI products or other products that scale to a large number of users. Logistics Location: This role is based in San Francisco, California. Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD. Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together. Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed. As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
04/02/2026
Full time
Thinking Machines Lab's mission is to empower humanity through advancing collaborative general intelligence. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals. We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and Character.ai, open-weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything. About the Role We're looking for a full stack engineer to build and ship products from prototype to scale and to maintain tools that accelerate research and product teams. You'll work across frontend and backend components, and contribute to reliability, observability, and security in production. Note: This is an "evergreen role" that we keep open on an on-going basis to express interest. We receive many applications, and there may not always be an immediate role that aligns perfectly with your experience and skills. Still, we encourage you to apply. We continuously review applications and reach out to applicants as new opportunities open. You are welcome to reapply if you get more experience, but please avoid applying more than once every 6 months. You may also find that we put up postings for singular roles for separate, project or team specific needs. In those cases, you're welcome to apply directly in addition to an evergreen role. What You'll Do Prototype and build new APIs and product backends in Python and Rust. Launch new products and UX with React and TypeScript where needed. Improve developer experience for local dev, deployment, testing, and iteration speed. Improve system reliability, observability, and security across production environments; participate in on call. Skills and Qualifications Bachelor's degree or equivalent experience in computer science, engineering, or similar. Proficiency in at least one backend language (we use Python or Rust). Some familiarity with ReactJS, TypeScript or mobile platforms. Comfort operating across the stack and owning projects end-to-end. Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts. A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships. Preferred qualifications - we encourage you to apply if you meet some but not all of these: Experience designing and maintaining backend APIs at scale. Experience building tooling or products for LLMs or other systems that scale to a large number of users. Ability to build high quality, production level UIs from prototype to polish. Familiarity with NodeJS, Python, and/or Rust. Experience building AI products or other products that scale to a large number of users. Logistics Location: This role is based in San Francisco, California. Compensation: Depending on background, skills and experience, the expected annual salary range for this position is $350,000 - $475,000 USD. Visa sponsorship: We sponsor visas. While we can't guarantee success for every candidate or role, if you're the right fit, we're committed to working through the visa process together. Benefits: Thinking Machines offers generous health, dental, and vision benefits, unlimited PTO, paid parental leave, and relocation support as needed. As set forth in Thinking Machines' Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.