it job board logo
  • Home
  • Find IT Jobs
  • Register CV
  • Register as Employer
  • Contact us
  • Career Advice
  • Recruiting? Post a job
  • Sign in
  • Sign up
  • Home
  • Find IT Jobs
  • Register CV
  • Register as Employer
  • Contact us
  • Career Advice
Sorry, that job is no longer available. Here are some results that may be similar to the job you were looking for.

12 jobs found

Email me jobs like this
Refine Search
Current Search
observability pipeline engineer hybrid
Boston Consulting Group
BCG Platinion Lead IT Architect - AI Platforms
Boston Consulting Group Detroit, Michigan
Locations : Atlanta Austin Boston Brooklyn Chicago Dallas Denver Detroit Durham Houston Miami Minneapolis Nashville New York Philadelphia Pittsburgh Summit Washington Who We Are Boston Consulting Group (BCG) () is a global consulting firm that partners with leaders in business and society to tackle their most important challenges and capture their greatest opportunities. Our success depends on a spirit of deep collaboration and a global community of diverse individuals determined to make the world and each other better every day. BCG's Tech and Digital Advantage (TDA) practice focuses on helping clients deliver competitive advantage and business superior performance through data, technology and digital. BCG Platinion sits within the TDA practice and is at the heart of the strategic impact we have with our clients. Our consultants and experts globally work across all industries and provide deep experience and expertise in a wide variety of topics including Digital Transformation, Data & Digital Platforms, AI at Scale, Cybersecurity and Digitizing the Tech Function. At BCG, we bring together the right people to conquer complexity, drive material change, and initiate positive, long-term impact. Explore our BCG Culture and Values () for more information. About BCG Platinion BCG Platinion's presence spans across the globe, with offices in Asia, Europe, and South and North America. We achieve digital excellence for clients with sustained solutions to the most complex and time-sensitive challenge. We guide clients into the future to push the status quo, overcome tech limitations, and enable our clients to go further in their digital journeys than what has ever been possible in the past. At BCG Platinion, we deliver business value through the innovative use of technology at a rapid pace. We roll up our sleeves to transform business, revolutionize approaches, satisfy customers, and change the game through Architecture, Cybersecurity, Digital Transformation, Enterprise Application and Risk functions. We balance vision with a pragmatic path to change transforming strategies into leading-edge tech platforms, at scale. What You'll Do Lead IT Architects - AI Platforms at BCG Platinion are: Collaborative. They are interdisciplinary team players who build strong working relationships across engineering, data, product, and client teams to drive alignment and delivery. Systems thinkers. They design robust AI platform and agentic system architectures that address complex business problems while balancing scalability, performance, and maintainability. Technical leaders. They bring strong AI and platform engineering expertise to guide architectural decisions and shape high-quality technical solutions. Comfortable with ambiguity. They operate effectively when requirements are evolving, helping teams navigate uncertainty and converge on pragmatic architectural choices. Change drivers. They support organizations in adopting new AI platforms, processes, and ways of working, helping teams transition solutions from concept to production. Agile practitioners. They apply agile and iterative delivery approaches to guide teams through complex technical challenges and evolving solution designs. Innovative. They apply modern AI platform and agentic design patterns to enable the next generation of AI-enabled products and capabilities. Practitioner-architects. They combine hands-on technical experience with architectural thinking, contributing directly where needed while guiding broader solution design. Trusted partners. They work closely with senior client stakeholders and internal leaders to translate business needs into scalable AI platform architectures that align with enterprise technology strategies. You're Good At: We are seeking a Lead AI Platform Architect to design and guide the implementation of intelligent, agentic AI platforms. This role bridges platform architecture and hands-on technical leadership, ideal for someone who can design end-to-end AI platform solutions while remaining close to implementation through prototyping and architectural validation. You will work across AI, data, product, and engineering teams to define and deliver scalable AI architectures that integrate large language models (LLMs), data pipelines, and enterprise systems. AI Architecture & Solution Design Design system architectures for AI and LLM-based solutions, balancing scalability, performance, modularity, and operational complexity. Evaluate emerging Ai frameworks and tooling (e.g., LangChain, LlamaIndez, LangGraph, Strands, Google ADK, Semantic Kernel, etc.) and recommend fit-for-purpose usage. Design agentic AI solutions to enable intelligent workflow automation, including task decomposition, memory usage, and orchestration patterns. Define AI integration patterns such as RAG for context management, model orchestration, prompt workflows, and enterprise system connectivity. Contribute to architectural standards and design principles for model lifecycle management, data lineage, and responsible AI practices. Support architecture decisions for hybrid pipelines (batch training, real-time inference) with consideration for cost, latency, and operational risk. Prototyping & Validation Develop proof-of-concepts and architectural prototypes to validate AI platform designs, RAG flows, and agent orchestration patterns. Support implementation of agentic workflows using modern orchestration and automation platforms in collaboration with engineering teams. Implementation Management and Support Support teams and clients as AI platforms transition from prototype to production environments. Contribute to system performance, observability, and governance approaches as solutions scale. Help define development standards for versioning, testing, deployment, and monitoring of AI services and models (LLMOps). Communication and Collaboration Translate complex AI platform concepts into clear narratives for both technical and non-technical stakeholders. Deliver structured presentations, lead technical discussions, and support client decision-making. Collaborate closely with product managers, data scientists, and engineers to align architecture with business objectives. Facilitate technical working sessions and design workshops, providing architectural guidance and hands-on mentorship. Provide direction and constructive feedback on key technical work items across teams. Team Management Support and guide junior team members by helping structure their work and technical approach. Mentor junior architects and technology consultants through ongoing feedback and development conversations. Provide quality assurance by reviewing outputs for technical correctness, clarity, and architectural alignment. Contribute to a positive team environment and model BCG's culture and values. Innovation and Growth Stay current on emerging techniques in agentic AI, RAG architectures, retrieval strategies, and AI platform patterns. Contribute ideas that improve AI platform architectures, delivery approaches, and client outcomes. Develop depth as a go-to expert in AI platform and agentic system design within project teams. Continuously build skills and technical breadth to increase impact over time. Support business development through technical input, architecture sections of proposals, and solution shaping. Build strong working relationships with client counterparts and internal stakeholders. What You'll Bring Bachelor's degree in information technology, computer science, engineering, or a related field (Master's degree is a plus). 6+ years of experience in software engineering, data engineering, or AI systems with experience in architecture or technical leadership roles. Strong foundation in software architecture and distributed systems. Experience designing AI/ML or LLM-based systems in production or near-production environments. Proficiency in one or more programming languages (e.g., Python, Javascript, TypeScript) is nice to have. Experience using modern AI-assisted coding and development tools (e.g., Cursor, Claude Code, Replit, Lovable) to accelerate prototyping experimentation, and implementation of AI platform solutions. Familiarity with modern AI agentic frameworks (LangChain, LangGraph, Semantic Kernel, AutoGen, LlamaIndex, etc.). Working knowledge of agentic AI concepts including RAG, multi-agent orchestration, memory patterns, vector databases, and function-calling. Understanding of Cloud architecture, data platforms, and API-based system integration, with depth in at least one hyperscaler (AWS, Azure, or GCP). Experience of deploying AI solutions on cloud AI platforms and services Integration of AI workflow with enterprise systems and data sources Exposure to containerization and orchestration technologies (Docker, Kubernetes) Experience with AI developer tools and productivity accelerators is a plus. Familiarity with vector databases and retrieval patterns; exposure to Pinecone, Qdrant, Aure Search, or similar is beneficial. Strong analytical thinking, problem-solving skills, and attention to engineering quality. Ability to explain technical concepts clearly to senior technical and non-technical stakeholders. Consulting mindset with comfort operating in ambiguous, fast-paced environments. Agile mindset with a focus on delivering business value through modern AI platforms. Willingness to travel to work with clients and BCG teams as needed. Gen AI fluency (e.g., proven usage of GenAI such as ChatGPT, Claude) and validation of response. Additional info What We Offer: At BCG, we care about our people, and offer best in class benefits to support you personally and professionally including: An opportunity to work organically across disciplines and across BCG, we offer a unified and unrivaled opportunity that combines strategic thinking with hands-on applications . click apply for full job details
04/02/2026
Full time
Locations : Atlanta Austin Boston Brooklyn Chicago Dallas Denver Detroit Durham Houston Miami Minneapolis Nashville New York Philadelphia Pittsburgh Summit Washington Who We Are Boston Consulting Group (BCG) () is a global consulting firm that partners with leaders in business and society to tackle their most important challenges and capture their greatest opportunities. Our success depends on a spirit of deep collaboration and a global community of diverse individuals determined to make the world and each other better every day. BCG's Tech and Digital Advantage (TDA) practice focuses on helping clients deliver competitive advantage and business superior performance through data, technology and digital. BCG Platinion sits within the TDA practice and is at the heart of the strategic impact we have with our clients. Our consultants and experts globally work across all industries and provide deep experience and expertise in a wide variety of topics including Digital Transformation, Data & Digital Platforms, AI at Scale, Cybersecurity and Digitizing the Tech Function. At BCG, we bring together the right people to conquer complexity, drive material change, and initiate positive, long-term impact. Explore our BCG Culture and Values () for more information. About BCG Platinion BCG Platinion's presence spans across the globe, with offices in Asia, Europe, and South and North America. We achieve digital excellence for clients with sustained solutions to the most complex and time-sensitive challenge. We guide clients into the future to push the status quo, overcome tech limitations, and enable our clients to go further in their digital journeys than what has ever been possible in the past. At BCG Platinion, we deliver business value through the innovative use of technology at a rapid pace. We roll up our sleeves to transform business, revolutionize approaches, satisfy customers, and change the game through Architecture, Cybersecurity, Digital Transformation, Enterprise Application and Risk functions. We balance vision with a pragmatic path to change transforming strategies into leading-edge tech platforms, at scale. What You'll Do Lead IT Architects - AI Platforms at BCG Platinion are: Collaborative. They are interdisciplinary team players who build strong working relationships across engineering, data, product, and client teams to drive alignment and delivery. Systems thinkers. They design robust AI platform and agentic system architectures that address complex business problems while balancing scalability, performance, and maintainability. Technical leaders. They bring strong AI and platform engineering expertise to guide architectural decisions and shape high-quality technical solutions. Comfortable with ambiguity. They operate effectively when requirements are evolving, helping teams navigate uncertainty and converge on pragmatic architectural choices. Change drivers. They support organizations in adopting new AI platforms, processes, and ways of working, helping teams transition solutions from concept to production. Agile practitioners. They apply agile and iterative delivery approaches to guide teams through complex technical challenges and evolving solution designs. Innovative. They apply modern AI platform and agentic design patterns to enable the next generation of AI-enabled products and capabilities. Practitioner-architects. They combine hands-on technical experience with architectural thinking, contributing directly where needed while guiding broader solution design. Trusted partners. They work closely with senior client stakeholders and internal leaders to translate business needs into scalable AI platform architectures that align with enterprise technology strategies. You're Good At: We are seeking a Lead AI Platform Architect to design and guide the implementation of intelligent, agentic AI platforms. This role bridges platform architecture and hands-on technical leadership, ideal for someone who can design end-to-end AI platform solutions while remaining close to implementation through prototyping and architectural validation. You will work across AI, data, product, and engineering teams to define and deliver scalable AI architectures that integrate large language models (LLMs), data pipelines, and enterprise systems. AI Architecture & Solution Design Design system architectures for AI and LLM-based solutions, balancing scalability, performance, modularity, and operational complexity. Evaluate emerging Ai frameworks and tooling (e.g., LangChain, LlamaIndez, LangGraph, Strands, Google ADK, Semantic Kernel, etc.) and recommend fit-for-purpose usage. Design agentic AI solutions to enable intelligent workflow automation, including task decomposition, memory usage, and orchestration patterns. Define AI integration patterns such as RAG for context management, model orchestration, prompt workflows, and enterprise system connectivity. Contribute to architectural standards and design principles for model lifecycle management, data lineage, and responsible AI practices. Support architecture decisions for hybrid pipelines (batch training, real-time inference) with consideration for cost, latency, and operational risk. Prototyping & Validation Develop proof-of-concepts and architectural prototypes to validate AI platform designs, RAG flows, and agent orchestration patterns. Support implementation of agentic workflows using modern orchestration and automation platforms in collaboration with engineering teams. Implementation Management and Support Support teams and clients as AI platforms transition from prototype to production environments. Contribute to system performance, observability, and governance approaches as solutions scale. Help define development standards for versioning, testing, deployment, and monitoring of AI services and models (LLMOps). Communication and Collaboration Translate complex AI platform concepts into clear narratives for both technical and non-technical stakeholders. Deliver structured presentations, lead technical discussions, and support client decision-making. Collaborate closely with product managers, data scientists, and engineers to align architecture with business objectives. Facilitate technical working sessions and design workshops, providing architectural guidance and hands-on mentorship. Provide direction and constructive feedback on key technical work items across teams. Team Management Support and guide junior team members by helping structure their work and technical approach. Mentor junior architects and technology consultants through ongoing feedback and development conversations. Provide quality assurance by reviewing outputs for technical correctness, clarity, and architectural alignment. Contribute to a positive team environment and model BCG's culture and values. Innovation and Growth Stay current on emerging techniques in agentic AI, RAG architectures, retrieval strategies, and AI platform patterns. Contribute ideas that improve AI platform architectures, delivery approaches, and client outcomes. Develop depth as a go-to expert in AI platform and agentic system design within project teams. Continuously build skills and technical breadth to increase impact over time. Support business development through technical input, architecture sections of proposals, and solution shaping. Build strong working relationships with client counterparts and internal stakeholders. What You'll Bring Bachelor's degree in information technology, computer science, engineering, or a related field (Master's degree is a plus). 6+ years of experience in software engineering, data engineering, or AI systems with experience in architecture or technical leadership roles. Strong foundation in software architecture and distributed systems. Experience designing AI/ML or LLM-based systems in production or near-production environments. Proficiency in one or more programming languages (e.g., Python, Javascript, TypeScript) is nice to have. Experience using modern AI-assisted coding and development tools (e.g., Cursor, Claude Code, Replit, Lovable) to accelerate prototyping experimentation, and implementation of AI platform solutions. Familiarity with modern AI agentic frameworks (LangChain, LangGraph, Semantic Kernel, AutoGen, LlamaIndex, etc.). Working knowledge of agentic AI concepts including RAG, multi-agent orchestration, memory patterns, vector databases, and function-calling. Understanding of Cloud architecture, data platforms, and API-based system integration, with depth in at least one hyperscaler (AWS, Azure, or GCP). Experience of deploying AI solutions on cloud AI platforms and services Integration of AI workflow with enterprise systems and data sources Exposure to containerization and orchestration technologies (Docker, Kubernetes) Experience with AI developer tools and productivity accelerators is a plus. Familiarity with vector databases and retrieval patterns; exposure to Pinecone, Qdrant, Aure Search, or similar is beneficial. Strong analytical thinking, problem-solving skills, and attention to engineering quality. Ability to explain technical concepts clearly to senior technical and non-technical stakeholders. Consulting mindset with comfort operating in ambiguous, fast-paced environments. Agile mindset with a focus on delivering business value through modern AI platforms. Willingness to travel to work with clients and BCG teams as needed. Gen AI fluency (e.g., proven usage of GenAI such as ChatGPT, Claude) and validation of response. Additional info What We Offer: At BCG, we care about our people, and offer best in class benefits to support you personally and professionally including: An opportunity to work organically across disciplines and across BCG, we offer a unified and unrivaled opportunity that combines strategic thinking with hands-on applications . click apply for full job details
Boston Consulting Group
BCG Platinion Lead IT Architect - AI Platforms
Boston Consulting Group Ferndale, Michigan
Locations: Atlanta Austin Boston Brooklyn Chicago Dallas Denver Detroit Durham Houston Miami Minneapolis Nashville New York Philadelphia Pittsburgh Summit Washington Who We Are Boston Consulting Group (BCG) is a global consulting firm that partners with leaders in business and society to tackle their most important challenges and capture their greatest opportunities. Our success depends on a spirit of deep collaboration and a global community of diverse individuals determined to make the world and each other better every day. BCG's Tech and Digital Advantage (TDA) practice focuses on helping clients deliver competitive advantage and business superior performance through data, technology and digital. BCG Platinion sits within the TDA practice and is at the heart of the strategic impact we have with our clients. Our consultants and experts globally work across all industries and provide deep experience and expertise in a wide variety of topics including Digital Transformation, Data & Digital Platforms, AI at Scale, Cybersecurity and Digitizing the Tech Function. At BCG, we bring together the right people to conquer complexity, drive material change, and initiate positive, long-term impact. Explore our BCG Culture and Values for more information. About BCG Platinion BCG Platinion's presence spans across the globe, with offices in Asia, Europe, and South and North America. We achieve digital excellence for clients with sustained solutions to the most complex and time-sensitive challenge. We guide clients into the future to push the status quo, overcome tech limitations, and enable our clients to go further in their digital journeys than what has ever been possible in the past. At BCG Platinion, we deliver business value through the innovative use of technology at a rapid pace. We roll up our sleeves to transform business, revolutionize approaches, satisfy customers, and change the game through Architecture, Cybersecurity, Digital Transformation, Enterprise Application and Risk functions. We balance vision with a pragmatic path to change transforming strategies into leading-edge tech platforms, at scale. What You'll Do Lead IT Architects - AI Platforms at BCG Platinion are: Collaborative. They are interdisciplinary team players who build strong working relationships across engineering, data, product, and client teams to drive alignment and delivery. Systems thinkers. They design robust AI platform and agentic system architectures that address complex business problems while balancing scalability, performance, and maintainability. Technical leaders. They bring strong AI and platform engineering expertise to guide architectural decisions and shape high-quality technical solutions. Comfortable with ambiguity. They operate effectively when requirements are evolving, helping teams navigate uncertainty and converge on pragmatic architectural choices. Change drivers. They support organizations in adopting new AI platforms, processes, and ways of working, helping teams transition solutions from concept to production. Agile practitioners. They apply agile and iterative delivery approaches to guide teams through complex technical challenges and evolving solution designs. Innovative. They apply modern AI platform and agentic design patterns to enable the next generation of AI-enabled products and capabilities. Practitioner-architects. They combine hands-on technical experience with architectural thinking, contributing directly where needed while guiding broader solution design. Trusted partners. They work closely with senior client stakeholders and internal leaders to translate business needs into scalable AI platform architectures that align with enterprise technology strategies. You're Good At: We are seeking a Lead AI Platform Architect to design and guide the implementation of intelligent, agentic AI platforms. This role bridges platform architecture and hands-on technical leadership, ideal for someone who can design end-to-end AI platform solutions while remaining close to implementation through prototyping and architectural validation. You will work across AI, data, product, and engineering teams to define and deliver scalable AI architectures that integrate large language models (LLMs), data pipelines, and enterprise systems. AI Architecture & Solution Design Design system architectures for AI and LLM-based solutions, balancing scalability, performance, modularity, and operational complexity. Evaluate emerging Ai frameworks and tooling (e.g., LangChain, LlamaIndez, LangGraph, Strands, Google ADK, Semantic Kernel, etc.) and recommend fit-for-purpose usage. Design agentic AI solutions to enable intelligent workflow automation, including task decomposition, memory usage, and orchestration patterns. Define AI integration patterns such as RAG for context management, model orchestration, prompt workflows, and enterprise system connectivity. Contribute to architectural standards and design principles for model lifecycle management, data lineage, and responsible AI practices. Support architecture decisions for hybrid pipelines (batch training, real-time inference) with consideration for cost, latency, and operational risk. Prototyping & Validation Develop proof-of-concepts and architectural prototypes to validate AI platform designs, RAG flows, and agent orchestration patterns. Support implementation of agentic workflows using modern orchestration and automation platforms in collaboration with engineering teams. Implementation Management and Support Support teams and clients as AI platforms transition from prototype to production environments. Contribute to system performance, observability, and governance approaches as solutions scale. Help define development standards for versioning, testing, deployment, and monitoring of AI services and models (LLMOps). Communication and Collaboration Translate complex AI platform concepts into clear narratives for both technical and non-technical stakeholders. Deliver structured presentations, lead technical discussions, and support client decision-making. Collaborate closely with product managers, data scientists, and engineers to align architecture with business objectives. Facilitate technical working sessions and design workshops, providing architectural guidance and hands-on mentorship. Provide direction and constructive feedback on key technical work items across teams. Team Management Support and guide junior team members by helping structure their work and technical approach. Mentor junior architects and technology consultants through ongoing feedback and development conversations. Provide quality assurance by reviewing outputs for technical correctness, clarity, and architectural alignment. Contribute to a positive team environment and model BCG's culture and values. Innovation and Growth Stay current on emerging techniques in agentic AI, RAG architectures, retrieval strategies, and AI platform patterns. Contribute ideas that improve AI platform architectures, delivery approaches, and client outcomes. Develop depth as a go-to expert in AI platform and agentic system design within project teams. Continuously build skills and technical breadth to increase impact over time. Support business development through technical input, architecture sections of proposals, and solution shaping. Build strong working relationships with client counterparts and internal stakeholders. What You'll Bring Bachelor's degree in information technology, computer science, engineering, or a related field (Master's degree is a plus). 6+ years of experience in software engineering, data engineering, or AI systems with experience in architecture or technical leadership roles. Strong foundation in software architecture and distributed systems. Experience designing AI/ML or LLM-based systems in production or near-production environments. Proficiency in one or more programming languages (e.g., Python, Javascript, TypeScript) is nice to have. Experience using modern AI-assisted coding and development tools (e.g., Cursor, Claude Code, Replit, Lovable) to accelerate prototyping experimentation, and implementation of AI platform solutions. Familiarity with modern AI agentic frameworks (LangChain, LangGraph, Semantic Kernel, AutoGen, LlamaIndex, etc.). Working knowledge of agentic AI concepts including RAG, multi-agent orchestration, memory patterns, vector databases, and function-calling. Understanding of Cloud architecture, data platforms, and API-based system integration, with depth in at least one hyperscaler (AWS, Azure, or GCP). Experience of deploying AI solutions on cloud AI platforms and services Integration of AI workflow with enterprise systems and data sources Exposure to containerization and orchestration technologies (Docker, Kubernetes) Experience with AI developer tools and productivity accelerators is a plus. Familiarity with vector databases and retrieval patterns; exposure to Pinecone, Qdrant, Aure Search, or similar is beneficial. Strong analytical thinking, problem-solving skills, and attention to engineering quality. Ability to explain technical concepts clearly to senior technical and non-technical stakeholders. Consulting mindset with comfort operating in ambiguous, fast-paced environments. . click apply for full job details
04/01/2026
Full time
Locations: Atlanta Austin Boston Brooklyn Chicago Dallas Denver Detroit Durham Houston Miami Minneapolis Nashville New York Philadelphia Pittsburgh Summit Washington Who We Are Boston Consulting Group (BCG) is a global consulting firm that partners with leaders in business and society to tackle their most important challenges and capture their greatest opportunities. Our success depends on a spirit of deep collaboration and a global community of diverse individuals determined to make the world and each other better every day. BCG's Tech and Digital Advantage (TDA) practice focuses on helping clients deliver competitive advantage and business superior performance through data, technology and digital. BCG Platinion sits within the TDA practice and is at the heart of the strategic impact we have with our clients. Our consultants and experts globally work across all industries and provide deep experience and expertise in a wide variety of topics including Digital Transformation, Data & Digital Platforms, AI at Scale, Cybersecurity and Digitizing the Tech Function. At BCG, we bring together the right people to conquer complexity, drive material change, and initiate positive, long-term impact. Explore our BCG Culture and Values for more information. About BCG Platinion BCG Platinion's presence spans across the globe, with offices in Asia, Europe, and South and North America. We achieve digital excellence for clients with sustained solutions to the most complex and time-sensitive challenge. We guide clients into the future to push the status quo, overcome tech limitations, and enable our clients to go further in their digital journeys than what has ever been possible in the past. At BCG Platinion, we deliver business value through the innovative use of technology at a rapid pace. We roll up our sleeves to transform business, revolutionize approaches, satisfy customers, and change the game through Architecture, Cybersecurity, Digital Transformation, Enterprise Application and Risk functions. We balance vision with a pragmatic path to change transforming strategies into leading-edge tech platforms, at scale. What You'll Do Lead IT Architects - AI Platforms at BCG Platinion are: Collaborative. They are interdisciplinary team players who build strong working relationships across engineering, data, product, and client teams to drive alignment and delivery. Systems thinkers. They design robust AI platform and agentic system architectures that address complex business problems while balancing scalability, performance, and maintainability. Technical leaders. They bring strong AI and platform engineering expertise to guide architectural decisions and shape high-quality technical solutions. Comfortable with ambiguity. They operate effectively when requirements are evolving, helping teams navigate uncertainty and converge on pragmatic architectural choices. Change drivers. They support organizations in adopting new AI platforms, processes, and ways of working, helping teams transition solutions from concept to production. Agile practitioners. They apply agile and iterative delivery approaches to guide teams through complex technical challenges and evolving solution designs. Innovative. They apply modern AI platform and agentic design patterns to enable the next generation of AI-enabled products and capabilities. Practitioner-architects. They combine hands-on technical experience with architectural thinking, contributing directly where needed while guiding broader solution design. Trusted partners. They work closely with senior client stakeholders and internal leaders to translate business needs into scalable AI platform architectures that align with enterprise technology strategies. You're Good At: We are seeking a Lead AI Platform Architect to design and guide the implementation of intelligent, agentic AI platforms. This role bridges platform architecture and hands-on technical leadership, ideal for someone who can design end-to-end AI platform solutions while remaining close to implementation through prototyping and architectural validation. You will work across AI, data, product, and engineering teams to define and deliver scalable AI architectures that integrate large language models (LLMs), data pipelines, and enterprise systems. AI Architecture & Solution Design Design system architectures for AI and LLM-based solutions, balancing scalability, performance, modularity, and operational complexity. Evaluate emerging Ai frameworks and tooling (e.g., LangChain, LlamaIndez, LangGraph, Strands, Google ADK, Semantic Kernel, etc.) and recommend fit-for-purpose usage. Design agentic AI solutions to enable intelligent workflow automation, including task decomposition, memory usage, and orchestration patterns. Define AI integration patterns such as RAG for context management, model orchestration, prompt workflows, and enterprise system connectivity. Contribute to architectural standards and design principles for model lifecycle management, data lineage, and responsible AI practices. Support architecture decisions for hybrid pipelines (batch training, real-time inference) with consideration for cost, latency, and operational risk. Prototyping & Validation Develop proof-of-concepts and architectural prototypes to validate AI platform designs, RAG flows, and agent orchestration patterns. Support implementation of agentic workflows using modern orchestration and automation platforms in collaboration with engineering teams. Implementation Management and Support Support teams and clients as AI platforms transition from prototype to production environments. Contribute to system performance, observability, and governance approaches as solutions scale. Help define development standards for versioning, testing, deployment, and monitoring of AI services and models (LLMOps). Communication and Collaboration Translate complex AI platform concepts into clear narratives for both technical and non-technical stakeholders. Deliver structured presentations, lead technical discussions, and support client decision-making. Collaborate closely with product managers, data scientists, and engineers to align architecture with business objectives. Facilitate technical working sessions and design workshops, providing architectural guidance and hands-on mentorship. Provide direction and constructive feedback on key technical work items across teams. Team Management Support and guide junior team members by helping structure their work and technical approach. Mentor junior architects and technology consultants through ongoing feedback and development conversations. Provide quality assurance by reviewing outputs for technical correctness, clarity, and architectural alignment. Contribute to a positive team environment and model BCG's culture and values. Innovation and Growth Stay current on emerging techniques in agentic AI, RAG architectures, retrieval strategies, and AI platform patterns. Contribute ideas that improve AI platform architectures, delivery approaches, and client outcomes. Develop depth as a go-to expert in AI platform and agentic system design within project teams. Continuously build skills and technical breadth to increase impact over time. Support business development through technical input, architecture sections of proposals, and solution shaping. Build strong working relationships with client counterparts and internal stakeholders. What You'll Bring Bachelor's degree in information technology, computer science, engineering, or a related field (Master's degree is a plus). 6+ years of experience in software engineering, data engineering, or AI systems with experience in architecture or technical leadership roles. Strong foundation in software architecture and distributed systems. Experience designing AI/ML or LLM-based systems in production or near-production environments. Proficiency in one or more programming languages (e.g., Python, Javascript, TypeScript) is nice to have. Experience using modern AI-assisted coding and development tools (e.g., Cursor, Claude Code, Replit, Lovable) to accelerate prototyping experimentation, and implementation of AI platform solutions. Familiarity with modern AI agentic frameworks (LangChain, LangGraph, Semantic Kernel, AutoGen, LlamaIndex, etc.). Working knowledge of agentic AI concepts including RAG, multi-agent orchestration, memory patterns, vector databases, and function-calling. Understanding of Cloud architecture, data platforms, and API-based system integration, with depth in at least one hyperscaler (AWS, Azure, or GCP). Experience of deploying AI solutions on cloud AI platforms and services Integration of AI workflow with enterprise systems and data sources Exposure to containerization and orchestration technologies (Docker, Kubernetes) Experience with AI developer tools and productivity accelerators is a plus. Familiarity with vector databases and retrieval patterns; exposure to Pinecone, Qdrant, Aure Search, or similar is beneficial. Strong analytical thinking, problem-solving skills, and attention to engineering quality. Ability to explain technical concepts clearly to senior technical and non-technical stakeholders. Consulting mindset with comfort operating in ambiguous, fast-paced environments. . click apply for full job details
MLOps Engineer
Careers Integrated Resources Inc Houston, Texas
Job Title: MLOps Engineer Job Location: Houston, TX, 77002 (Hybrid - 4 Days a week in office) Job Contract: 8 Months+ contract (with possible extension) Note: W2 only Job Description: Must-have: Hands-on experience with AWS, Microsoft Azure, and Snowflake in building or supporting production ML/data platforms. Job Summary: We are seeking an MLOps Engineer to design, deploy, monitor, and maintain machine learning solutions in production across AWS, Microsoft Azure, and Snowflake environments. This role will partner with data scientists and cloud teams to operationalize ML models, automate pipelines, and build reliable, secure, and scalable ML platforms. The ideal candidate has strong experience in the end-to-end ML lifecycle, cloud-native deployment, CI/CD automation, model monitoring, and production data pipelines, with hands-on expertise in AWS, Azure, and Snowflake. Key Responsibilities: Design and implement end-to-end ML pipelines for data ingestion, feature engineering, model training, validation, deployment, and monitoring. Deploy and manage ML models in production across AWS, Azure, and Snowflake-based ecosystems. Build batch and real-time inference pipelines using cloud-native and platform-native services Automate model packaging, testing, release, and rollback using CI/CD best practices. Integrate ML workflows with services such as AWS SageMaker, AWS Lambda, Azure Machine Learning, Azure Data Factory, and Snowflake. Build and maintain orchestration workflows using tools such as Airflow, Azure Data Factory, or similar platforms. Implement experiment tracking, model registry, and model governance processes. Monitor model accuracy, drift, latency, throughput, pipeline failures, and infrastructure usage. Establish deployment strategies such as canary, shadow, blue-green, and rollback mechanisms. Collaborate with cross-functional teams to move models from research to production. Ensure security, compliance, traceability, and access control for models and data across cloud environments. Optimize platform performance, reliability, and cost across AWS, Azure, and Snowflake. Document architecture, deployment standards, and operational procedures. Required Qualifications: Master's or Advanced degree (PhD) in Computer Science, Computer Engineering, or Similar Five or more years of relevant experiences Proven experience in MLOps, ML engineering, platform engineering, or DevOps Strong hands-on experience with AWS, Microsoft Azure, and Snowflake Strong programming skills in Python and SQL Experience deploying and managing ML models in production Experience with cloud ML services such as AWS SageMaker and Azure Machine Learning Experience building data pipelines and integrating with Snowflake Knowledge of CI/CD pipelines, infrastructure automation, and model versioning Experience with containerization and orchestration tools such as Docker and Kubernetes Experience with workflow orchestration tools such as Airflow, Azure Data Factory, or similar Familiarity with model monitoring, logging, alerting, and observability Solid understanding of data engineering concepts, APIs, and distributed processing Strong troubleshooting, communication, and cross-team collaboration skills Preferred Qualifications: Experience with Snowflake Cortex AI, Snowpark, or ML workloads in Snowflake Experience with AWS Bedrock, Azure Open AI, or production LLM workflows Experience with real-time inference, event-driven pipelines, and server less architectures Familiarity with feature stores, vector databases, and RAG-based systems Experience with Terraform, Cloud Formation, or Azure infrastructure-as-code tools Understanding of security, compliance, and governance requirements for regulated environments Experience with production A/B testing, shadow deployment, and rollback strategies
04/01/2026
Full time
Job Title: MLOps Engineer Job Location: Houston, TX, 77002 (Hybrid - 4 Days a week in office) Job Contract: 8 Months+ contract (with possible extension) Note: W2 only Job Description: Must-have: Hands-on experience with AWS, Microsoft Azure, and Snowflake in building or supporting production ML/data platforms. Job Summary: We are seeking an MLOps Engineer to design, deploy, monitor, and maintain machine learning solutions in production across AWS, Microsoft Azure, and Snowflake environments. This role will partner with data scientists and cloud teams to operationalize ML models, automate pipelines, and build reliable, secure, and scalable ML platforms. The ideal candidate has strong experience in the end-to-end ML lifecycle, cloud-native deployment, CI/CD automation, model monitoring, and production data pipelines, with hands-on expertise in AWS, Azure, and Snowflake. Key Responsibilities: Design and implement end-to-end ML pipelines for data ingestion, feature engineering, model training, validation, deployment, and monitoring. Deploy and manage ML models in production across AWS, Azure, and Snowflake-based ecosystems. Build batch and real-time inference pipelines using cloud-native and platform-native services Automate model packaging, testing, release, and rollback using CI/CD best practices. Integrate ML workflows with services such as AWS SageMaker, AWS Lambda, Azure Machine Learning, Azure Data Factory, and Snowflake. Build and maintain orchestration workflows using tools such as Airflow, Azure Data Factory, or similar platforms. Implement experiment tracking, model registry, and model governance processes. Monitor model accuracy, drift, latency, throughput, pipeline failures, and infrastructure usage. Establish deployment strategies such as canary, shadow, blue-green, and rollback mechanisms. Collaborate with cross-functional teams to move models from research to production. Ensure security, compliance, traceability, and access control for models and data across cloud environments. Optimize platform performance, reliability, and cost across AWS, Azure, and Snowflake. Document architecture, deployment standards, and operational procedures. Required Qualifications: Master's or Advanced degree (PhD) in Computer Science, Computer Engineering, or Similar Five or more years of relevant experiences Proven experience in MLOps, ML engineering, platform engineering, or DevOps Strong hands-on experience with AWS, Microsoft Azure, and Snowflake Strong programming skills in Python and SQL Experience deploying and managing ML models in production Experience with cloud ML services such as AWS SageMaker and Azure Machine Learning Experience building data pipelines and integrating with Snowflake Knowledge of CI/CD pipelines, infrastructure automation, and model versioning Experience with containerization and orchestration tools such as Docker and Kubernetes Experience with workflow orchestration tools such as Airflow, Azure Data Factory, or similar Familiarity with model monitoring, logging, alerting, and observability Solid understanding of data engineering concepts, APIs, and distributed processing Strong troubleshooting, communication, and cross-team collaboration skills Preferred Qualifications: Experience with Snowflake Cortex AI, Snowpark, or ML workloads in Snowflake Experience with AWS Bedrock, Azure Open AI, or production LLM workflows Experience with real-time inference, event-driven pipelines, and server less architectures Familiarity with feature stores, vector databases, and RAG-based systems Experience with Terraform, Cloud Formation, or Azure infrastructure-as-code tools Understanding of security, compliance, and governance requirements for regulated environments Experience with production A/B testing, shadow deployment, and rollback strategies
Sr. Cloud Infrastructure Engineer - Terraform / IaC
Cliff Services Inc Austin, Texas
Sr. Cloud Infrastructure Engineer Terraform / IaC Location: Austin, TX - Hybrid Experience: 8+ Years ABOUT THE ENGAGEMENT A Fortune 500 financial services and investment brokerage firm is operationalizing its multi-cloud infrastructure to enable scalable and secure migration of enterprise applications across AWS, Azure, and GCP. The Cloud Services organization empowers development teams with a self-service catalog of secure, regulatory-compliant cloud products. You will work alongside Cloud Architects, Cloud Security Engineers, SREs, and application teams to design, build, and maintain enterprise-grade cloud infrastructure serving millions of retail and institutional investors. KEY RESPONSIBILITIES Design, implement, and maintain cloud infrastructure across AWS, Azure, and/or GCP in a large-scale enterprise financial environment Lead Infrastructure as Code (IaC) efforts using Terraform including building net-new modules, maintaining existing modules, and resolving issues with existing configurations Architect and manage cloud networking constructs: VPCs, subnets, routing, peering, NAT, DNS, load balancing, VPNs, Direct Connect, hybrid connectivity Coordinate and execute cloud migrations of enterprise applications to multi-cloud infrastructure Build and manage CI/CD pipelines using Git, Bitbucket, Bamboo, Jenkins, or Concourse Develop automation scripts using Python, Bash, PowerShell, or Golang Ensure high availability of mission-critical 24x7 financial systems through operational processes, SRE practices, and incident management Troubleshoot cloud and hybrid cloud-to-data center deployments using log aggregators such as Splunk, CloudWatch, or Cloud Monitoring Enforce cloud compliance, security, and scalability standards aligned with financial regulatory requirements (SEC, FINRA) Follow ITIL Operations Processes: Change Management, Incident Management, Problem Management, and Service Request Management Mentor junior engineers and provide technical oversight to vendor staff Collaborate with Cloud Security engineers to ensure secure architecture design across all environments REQUIRED QUALIFICATIONS 8+ years of experience in Cloud Infrastructure Engineering or equivalent enterprise IT roles 5+ years hands-on experience with public cloud platforms AWS, Azure, or GCP (multi-cloud preferred) 5+ years experience with Terraform including building new modules from scratch, managing state, and identifying and resolving module-level issues Strong expertise in cloud networking: CIDR, NAT, DNS, load balancing, VPC peering, hybrid connectivity, VPNs, Direct Connect / Interconnect Proficiency in at least one scripting/automation language: Python, Bash, PowerShell, or Golang Solid experience with DevOps tooling: Git, GitHub/GitLab/Bitbucket, Bamboo, Jenkins, Concourse Experience supporting mission-critical 24x7 enterprise systems in a regulated financial services environment Strong understanding of cloud security, IAM, compliance frameworks, and access management best practices Hands-on experience with Docker, Kubernetes, and container orchestration Experience with log aggregation and observability tools: Splunk, CloudWatch, or equivalent PREFERRED QUALIFICATIONS Experience with Ansible, Salt, or CloudFormation in addition to Terraform Familiarity with GCP services: BigQuery, Cloud Run, Cloud Composer, Cloud Storage, Pub/Sub Experience with AWS serverless: Lambda, API Gateway, SQS/SNS, DynamoDB Knowledge of ITIL best practices Prior experience in financial services, banking, or brokerage environments (SEC/FINRA regulated) Cloud certifications: AWS Solutions Architect, Azure Administrator, GCP Professional Cloud Architect, or equivalent
04/01/2026
Full time
Sr. Cloud Infrastructure Engineer Terraform / IaC Location: Austin, TX - Hybrid Experience: 8+ Years ABOUT THE ENGAGEMENT A Fortune 500 financial services and investment brokerage firm is operationalizing its multi-cloud infrastructure to enable scalable and secure migration of enterprise applications across AWS, Azure, and GCP. The Cloud Services organization empowers development teams with a self-service catalog of secure, regulatory-compliant cloud products. You will work alongside Cloud Architects, Cloud Security Engineers, SREs, and application teams to design, build, and maintain enterprise-grade cloud infrastructure serving millions of retail and institutional investors. KEY RESPONSIBILITIES Design, implement, and maintain cloud infrastructure across AWS, Azure, and/or GCP in a large-scale enterprise financial environment Lead Infrastructure as Code (IaC) efforts using Terraform including building net-new modules, maintaining existing modules, and resolving issues with existing configurations Architect and manage cloud networking constructs: VPCs, subnets, routing, peering, NAT, DNS, load balancing, VPNs, Direct Connect, hybrid connectivity Coordinate and execute cloud migrations of enterprise applications to multi-cloud infrastructure Build and manage CI/CD pipelines using Git, Bitbucket, Bamboo, Jenkins, or Concourse Develop automation scripts using Python, Bash, PowerShell, or Golang Ensure high availability of mission-critical 24x7 financial systems through operational processes, SRE practices, and incident management Troubleshoot cloud and hybrid cloud-to-data center deployments using log aggregators such as Splunk, CloudWatch, or Cloud Monitoring Enforce cloud compliance, security, and scalability standards aligned with financial regulatory requirements (SEC, FINRA) Follow ITIL Operations Processes: Change Management, Incident Management, Problem Management, and Service Request Management Mentor junior engineers and provide technical oversight to vendor staff Collaborate with Cloud Security engineers to ensure secure architecture design across all environments REQUIRED QUALIFICATIONS 8+ years of experience in Cloud Infrastructure Engineering or equivalent enterprise IT roles 5+ years hands-on experience with public cloud platforms AWS, Azure, or GCP (multi-cloud preferred) 5+ years experience with Terraform including building new modules from scratch, managing state, and identifying and resolving module-level issues Strong expertise in cloud networking: CIDR, NAT, DNS, load balancing, VPC peering, hybrid connectivity, VPNs, Direct Connect / Interconnect Proficiency in at least one scripting/automation language: Python, Bash, PowerShell, or Golang Solid experience with DevOps tooling: Git, GitHub/GitLab/Bitbucket, Bamboo, Jenkins, Concourse Experience supporting mission-critical 24x7 enterprise systems in a regulated financial services environment Strong understanding of cloud security, IAM, compliance frameworks, and access management best practices Hands-on experience with Docker, Kubernetes, and container orchestration Experience with log aggregation and observability tools: Splunk, CloudWatch, or equivalent PREFERRED QUALIFICATIONS Experience with Ansible, Salt, or CloudFormation in addition to Terraform Familiarity with GCP services: BigQuery, Cloud Run, Cloud Composer, Cloud Storage, Pub/Sub Experience with AWS serverless: Lambda, API Gateway, SQS/SNS, DynamoDB Knowledge of ITIL best practices Prior experience in financial services, banking, or brokerage environments (SEC/FINRA regulated) Cloud certifications: AWS Solutions Architect, Azure Administrator, GCP Professional Cloud Architect, or equivalent
AI Automation Engineer
Blake Smith Staffing, LLC Stamford, Connecticut
Position Summary Need legal permission to work for a U.S. employer (Green Card or US Citizenship) AI Automation Engineer FullTime Onsite/Hybrid/Remote (2 days in the Stamford office) About the Role We are seeking an AI Automation Engineer to design, build, and scale intelligent automation solutions that transform how our teams operate. This role blends systems thinking, AI technologies, and workflow automation to eliminate bottlenecks, improve efficiency, and enable smarter decisionmaking across the organization. You will work directly with crossfunctional teams to identify automation opportunities, rapidly prototype solutions, and turn successful concepts into durable, productionready systems. This is a highimpact role for someone who wants to champion AI adoption and fundamentally redesign how work gets done. Key Responsibilities Identify and prioritize automation opportunities across multiple departments. Rapidly prototype AIpowered workflows using tools such as Zapier, LLM APIs, agent frameworks, and other automation platforms. Embed with teams to understand their processes and redesign them with AIfirst principles. Integrate systems into reliable automation systems with strong error handling, observability, and logging. Troubleshoot, debug, and refine automations to ensure accuracy and uptime. Maintain and optimize integrations between internal systems, APIs, and thirdparty platforms. Serve as an internal AI advocate by hosting workshops, creating playbooks, and training employees on safe and effective AI usage. Stay current with emerging AI tools and best practices to continuously expand automation capabilities. Qualifications Required: Experience with automation tools (Power Automate, Make, n8n, etc.) or scripting languages (Python, JavaScript). Familiarity with LLM APIs (Chat GBT, Azure) and agentbased frameworks. Strong understanding of systems integration and workflow design. Ability to rapidly prototype and iterate based on user feedback. Strong analytical and problemsolving skills. Excellent communication skills and comfort working with nontechnical teams. Preferred: Experience with cloud platforms (AWS, Azure, GCP). Background in IT systems, DevOps, or software engineering. Knowledge of observability tools and logging frameworks as well as CI/CD Pipelines Prior experience training or enabling teams on technical tools. Advance knowledge of Microsoft products including Power Automate What Were Looking For An AI systems integrator who enjoys experimenting with new tools and turning ideas into working systems. Someone who thrives in fastmoving, ambiguous environments. A collaborative partner who can embed with teams and understand their challenges. A proactive learner who stays ahead of the rapidly evolving AI landscape. Why Join Us Shape the companys AI automation strategy from the ground up. Work with cuttingedge AI technologies and help with systems integration that directly impact productivity. Highvisibility role with opportunities for rapid growth. A culture that values creativity, experimentation, and continuous improvement.
04/01/2026
Position Summary Need legal permission to work for a U.S. employer (Green Card or US Citizenship) AI Automation Engineer FullTime Onsite/Hybrid/Remote (2 days in the Stamford office) About the Role We are seeking an AI Automation Engineer to design, build, and scale intelligent automation solutions that transform how our teams operate. This role blends systems thinking, AI technologies, and workflow automation to eliminate bottlenecks, improve efficiency, and enable smarter decisionmaking across the organization. You will work directly with crossfunctional teams to identify automation opportunities, rapidly prototype solutions, and turn successful concepts into durable, productionready systems. This is a highimpact role for someone who wants to champion AI adoption and fundamentally redesign how work gets done. Key Responsibilities Identify and prioritize automation opportunities across multiple departments. Rapidly prototype AIpowered workflows using tools such as Zapier, LLM APIs, agent frameworks, and other automation platforms. Embed with teams to understand their processes and redesign them with AIfirst principles. Integrate systems into reliable automation systems with strong error handling, observability, and logging. Troubleshoot, debug, and refine automations to ensure accuracy and uptime. Maintain and optimize integrations between internal systems, APIs, and thirdparty platforms. Serve as an internal AI advocate by hosting workshops, creating playbooks, and training employees on safe and effective AI usage. Stay current with emerging AI tools and best practices to continuously expand automation capabilities. Qualifications Required: Experience with automation tools (Power Automate, Make, n8n, etc.) or scripting languages (Python, JavaScript). Familiarity with LLM APIs (Chat GBT, Azure) and agentbased frameworks. Strong understanding of systems integration and workflow design. Ability to rapidly prototype and iterate based on user feedback. Strong analytical and problemsolving skills. Excellent communication skills and comfort working with nontechnical teams. Preferred: Experience with cloud platforms (AWS, Azure, GCP). Background in IT systems, DevOps, or software engineering. Knowledge of observability tools and logging frameworks as well as CI/CD Pipelines Prior experience training or enabling teams on technical tools. Advance knowledge of Microsoft products including Power Automate What Were Looking For An AI systems integrator who enjoys experimenting with new tools and turning ideas into working systems. Someone who thrives in fastmoving, ambiguous environments. A collaborative partner who can embed with teams and understand their challenges. A proactive learner who stays ahead of the rapidly evolving AI landscape. Why Join Us Shape the companys AI automation strategy from the ground up. Work with cuttingedge AI technologies and help with systems integration that directly impact productivity. Highvisibility role with opportunities for rapid growth. A culture that values creativity, experimentation, and continuous improvement.
InfraOps Reliability Administrator
InsideHigherEd Tallahassee, Florida
Job Title: InfraOps Reliability Administrator Location: Hybrid Regular/Temporary: Regular Full/Part Time: Full-Time Job ID: 60506 Department This position is within FSU's Department of Information Technology Services (ITS) Click here to see what the current team has to say about this role. Responsibilities The FSU College of Medicine Infrastructure and Operations team designs, builds, and manages infrastructure and servers to support other IT teams, faculty, staff, researchers, and students within the college. The team leverages the latest in automation and observability solutions to make complex work easier to accomplish. Design, build, automate, and optimize infrastructure using modern tools and site reliability engineering practices. Manage primarily Windows servers in a hybrid cloud environment, with a focus on reliability, observability, security, and continuous improvement. Collaborate across teams and leverage automation, scripting, data-informed decision-making, and self-directed professional development to deliver secure, scalable, and customer-focused solutions. Infrastructure and configuration as code: Use tools such as Terraform, Azure DevOps, Visual Studio Code, and scripting languages like PowerShell and Bash to manage infrastructure as code (IaC) and configuration as code (CaC), ensuring consistency, repeatability, and auditability of systems. Use observability solutions, such as Elastic, to monitor deployments and support data-informed decisions and rapid experiments, that drive continuous improvement. Work with CI/CD pipelines to automate deployment, validation, and testing processes, ensuring systems are secure by design, mitigate vulnerabilities, and are compliant with security policies and standards. Follow secure coding practices, adhere to coding standards, and leverage version control, automated testing, and test-driven development to produce high-quality, secure, and maintainable code. Use AI-assisted tools to accelerate development, validation, and troubleshooting. Participate in pair programming sessions as appropriate to write code and resolve deployment issues. Provision and manage server infrastructure: Deploy and manage Windows and Linux servers across a hybrid environment that includes Microsoft Azure and over a dozen geographically dispersed on-premises locations. This includes ensuring that all systems are secure by design, follow zero trust principles, and are scalable, observable, and aligned with business needs. Provision infrastructure with reliability, maintainability, and consistency in mind, and implement observability prior to production to support proactive monitoring and data-informed decisions. Collaborate with cross-functional teams and stakeholders throughout the infrastructure lifecycle to ensure solutions align with customer needs; prioritize high-value work, assess feasibility, and conduct security reviews of new systems and applications; deliver exceptional customer service and maintain clear communication to support successful outcomes. Automation: Automation is not just a task, it is a mindset and a strategic enabler of reliability, consistency, and scalability. Design and implement solutions that make work easier, reduce manual effort, improve system reliability, and streamline operations across provisioning, configuration, monitoring, and remediation. Use AI, scripting, workflow automation, or robotic process automation (RPA) tools to reduce operational overhead and accelerate delivery. Use observability tools to monitor automation performance, ensure reliability, and identify data-informed opportunities for continuous improvement. Collaborate with peers and stakeholders to prioritize high-value automation opportunities and ensure that solutions are effective, secure, and aligned with business needs. Network administration: Manage and troubleshoot enterprise-grade network infrastructure, including wireless access points, switches, routers, load balancers, and next-generation firewalls. Diagnose and resolve network issues using packet captures, OS command outputs, diagnostic consoles, logs, or other tools. Leverage network observability tools to make data-informed decisions and identify opportunities for improvement. Implement and maintain security measures to protect data, systems, and network availability. Collaborate with network and security teams to validate new systems and configurations, expand observability, reduce exploitable vulnerabilities, implement security controls, and enhance system resilience and usability for customers. Documentation and process improvement: Create and maintain clear, concise documentation for knowledge sharing, process repeatability, and operational continuity. Develop system diagrams, deployment guides, and standard operating procedures (SOPs) that support usability, compliance, and reliability. Continuously refine documentation and processes as systems evolve, incorporating feedback and lessons learned. Ensure all procedures align with FSU ITS Security Policies and Standards. Participate in peer reviews to validate documentation for accuracy, clarity, and usability. Support and incident response: Respond to system alerts, outages, and support requests in accordance with established incident management procedures, collaborating with peers and stakeholders to ensure rapid resolution. Use observability tools to support rapid diagnosis and resolution, and create new monitoring as needed to improve visibility. Participate in post-incident reviews, highlighting key data points and observability insights to identify root causes and opportunities for system or process improvements. Implement improvements to prevent the recurrence of issues and to enhance system reliability. Participate in an on-call rotation, typically one week per month, which includes after-hours support for deployments, changes, or incidents, including on holidays and weekends. Actively work to reduce the need for after-hours assistance by leveraging automated deployment solutions, improving system reliability, and lowering the risk and complexity of changes. Assist with IT security investigations as needed. Ensure incident response processes align with the expectations of IT management, technical teams, and customers. Professional development: Continuous learning and technical curiosity are key expectations of this role. Complete both assigned and self-directed professional development to stay current with evolving technologies, tools, and practices. Explore technical subjects that interest you, even beyond current projects. Use provided learning platforms, such as LinkedIn Learning. Participate in the ITS Professional Development Bonus Plan by completing manager-approved certifications. Pursue relevant training, certifications, and conferences aligned with team goals, subject to approval. Approved training resources will be paid for by the organization. Research and validate emerging tools, including AI, automation, observability, and other innovations, to assess their value for our organization. Apply a mindset of rapid experimentation using data to guide decisions, improvements, and the next experiment. Participation in knowledge-sharing sessions, communities of practice, and collaborative learning opportunities is encouraged. Qualifications Bachelor's degree in Computer Science, MIS, or other appropriate degree and two years experience or a high school diploma or equivalent and six years of experience. (Note: or a combination of appropriate post high school education and experience equal to six years.) Preferred Qualifications Proven ability to learn new tools and technologies quickly, with a track record of self-directed learning and adaptability in fast-paced environments. Demonstrated commitment to continuous learning and professional development. Proficient in scripting for infrastructure automation using PowerShell, with the ability to write, debug, and maintain scripts independently or with tools like GitHub Copilot; familiarity with Python or Bash is a plus. Experience using infrastructure and configuration as code tools such as Terraform, Ansible, PowerShell, or similar, with version control practices using Git, and integrated development environments like Visual Studio Code. Experience creating and troubleshooting CI/CD pipelines using tools such as Azure DevOps, GitHub Actions, or GitLab to automate infrastructure deployment and configuration. Experience provisioning and managing infrastructure in cloud environments such as Azure, AWS, or Google Cloud, with an understanding of repeatable deployment processes, and troubleshooting network connectivity with next-generation firewalls. Experience deploying containers and familiarity with container orchestration technologies such as Kubernetes. Proficient using observability tools such as Elastic, Dynatrace, Prometheus, Grafana, Splunk, Datadog, or others, to ingest new types of data, build dashboards and alerts, and derive insights for performance tuning and incident response. Experience improving infrastructure design, automation, or troubleshooting by testing ideas, learning from results, and making thoughtful adjustments over time. Experience supporting Windows and Linux systems in an Active Directory domain, including deployment, configuration, and troubleshooting, as well as managing virtual infrastructure using platforms such as Hyper-V or VMware. Experience leveraging AI tools to accelerate task completion and improve operational efficiency. Demonstrated ability to write and troubleshoot firewall rules and quickly diagnose issues across firewalls, switches . click apply for full job details
01/14/2026
Full time
Job Title: InfraOps Reliability Administrator Location: Hybrid Regular/Temporary: Regular Full/Part Time: Full-Time Job ID: 60506 Department This position is within FSU's Department of Information Technology Services (ITS) Click here to see what the current team has to say about this role. Responsibilities The FSU College of Medicine Infrastructure and Operations team designs, builds, and manages infrastructure and servers to support other IT teams, faculty, staff, researchers, and students within the college. The team leverages the latest in automation and observability solutions to make complex work easier to accomplish. Design, build, automate, and optimize infrastructure using modern tools and site reliability engineering practices. Manage primarily Windows servers in a hybrid cloud environment, with a focus on reliability, observability, security, and continuous improvement. Collaborate across teams and leverage automation, scripting, data-informed decision-making, and self-directed professional development to deliver secure, scalable, and customer-focused solutions. Infrastructure and configuration as code: Use tools such as Terraform, Azure DevOps, Visual Studio Code, and scripting languages like PowerShell and Bash to manage infrastructure as code (IaC) and configuration as code (CaC), ensuring consistency, repeatability, and auditability of systems. Use observability solutions, such as Elastic, to monitor deployments and support data-informed decisions and rapid experiments, that drive continuous improvement. Work with CI/CD pipelines to automate deployment, validation, and testing processes, ensuring systems are secure by design, mitigate vulnerabilities, and are compliant with security policies and standards. Follow secure coding practices, adhere to coding standards, and leverage version control, automated testing, and test-driven development to produce high-quality, secure, and maintainable code. Use AI-assisted tools to accelerate development, validation, and troubleshooting. Participate in pair programming sessions as appropriate to write code and resolve deployment issues. Provision and manage server infrastructure: Deploy and manage Windows and Linux servers across a hybrid environment that includes Microsoft Azure and over a dozen geographically dispersed on-premises locations. This includes ensuring that all systems are secure by design, follow zero trust principles, and are scalable, observable, and aligned with business needs. Provision infrastructure with reliability, maintainability, and consistency in mind, and implement observability prior to production to support proactive monitoring and data-informed decisions. Collaborate with cross-functional teams and stakeholders throughout the infrastructure lifecycle to ensure solutions align with customer needs; prioritize high-value work, assess feasibility, and conduct security reviews of new systems and applications; deliver exceptional customer service and maintain clear communication to support successful outcomes. Automation: Automation is not just a task, it is a mindset and a strategic enabler of reliability, consistency, and scalability. Design and implement solutions that make work easier, reduce manual effort, improve system reliability, and streamline operations across provisioning, configuration, monitoring, and remediation. Use AI, scripting, workflow automation, or robotic process automation (RPA) tools to reduce operational overhead and accelerate delivery. Use observability tools to monitor automation performance, ensure reliability, and identify data-informed opportunities for continuous improvement. Collaborate with peers and stakeholders to prioritize high-value automation opportunities and ensure that solutions are effective, secure, and aligned with business needs. Network administration: Manage and troubleshoot enterprise-grade network infrastructure, including wireless access points, switches, routers, load balancers, and next-generation firewalls. Diagnose and resolve network issues using packet captures, OS command outputs, diagnostic consoles, logs, or other tools. Leverage network observability tools to make data-informed decisions and identify opportunities for improvement. Implement and maintain security measures to protect data, systems, and network availability. Collaborate with network and security teams to validate new systems and configurations, expand observability, reduce exploitable vulnerabilities, implement security controls, and enhance system resilience and usability for customers. Documentation and process improvement: Create and maintain clear, concise documentation for knowledge sharing, process repeatability, and operational continuity. Develop system diagrams, deployment guides, and standard operating procedures (SOPs) that support usability, compliance, and reliability. Continuously refine documentation and processes as systems evolve, incorporating feedback and lessons learned. Ensure all procedures align with FSU ITS Security Policies and Standards. Participate in peer reviews to validate documentation for accuracy, clarity, and usability. Support and incident response: Respond to system alerts, outages, and support requests in accordance with established incident management procedures, collaborating with peers and stakeholders to ensure rapid resolution. Use observability tools to support rapid diagnosis and resolution, and create new monitoring as needed to improve visibility. Participate in post-incident reviews, highlighting key data points and observability insights to identify root causes and opportunities for system or process improvements. Implement improvements to prevent the recurrence of issues and to enhance system reliability. Participate in an on-call rotation, typically one week per month, which includes after-hours support for deployments, changes, or incidents, including on holidays and weekends. Actively work to reduce the need for after-hours assistance by leveraging automated deployment solutions, improving system reliability, and lowering the risk and complexity of changes. Assist with IT security investigations as needed. Ensure incident response processes align with the expectations of IT management, technical teams, and customers. Professional development: Continuous learning and technical curiosity are key expectations of this role. Complete both assigned and self-directed professional development to stay current with evolving technologies, tools, and practices. Explore technical subjects that interest you, even beyond current projects. Use provided learning platforms, such as LinkedIn Learning. Participate in the ITS Professional Development Bonus Plan by completing manager-approved certifications. Pursue relevant training, certifications, and conferences aligned with team goals, subject to approval. Approved training resources will be paid for by the organization. Research and validate emerging tools, including AI, automation, observability, and other innovations, to assess their value for our organization. Apply a mindset of rapid experimentation using data to guide decisions, improvements, and the next experiment. Participation in knowledge-sharing sessions, communities of practice, and collaborative learning opportunities is encouraged. Qualifications Bachelor's degree in Computer Science, MIS, or other appropriate degree and two years experience or a high school diploma or equivalent and six years of experience. (Note: or a combination of appropriate post high school education and experience equal to six years.) Preferred Qualifications Proven ability to learn new tools and technologies quickly, with a track record of self-directed learning and adaptability in fast-paced environments. Demonstrated commitment to continuous learning and professional development. Proficient in scripting for infrastructure automation using PowerShell, with the ability to write, debug, and maintain scripts independently or with tools like GitHub Copilot; familiarity with Python or Bash is a plus. Experience using infrastructure and configuration as code tools such as Terraform, Ansible, PowerShell, or similar, with version control practices using Git, and integrated development environments like Visual Studio Code. Experience creating and troubleshooting CI/CD pipelines using tools such as Azure DevOps, GitHub Actions, or GitLab to automate infrastructure deployment and configuration. Experience provisioning and managing infrastructure in cloud environments such as Azure, AWS, or Google Cloud, with an understanding of repeatable deployment processes, and troubleshooting network connectivity with next-generation firewalls. Experience deploying containers and familiarity with container orchestration technologies such as Kubernetes. Proficient using observability tools such as Elastic, Dynatrace, Prometheus, Grafana, Splunk, Datadog, or others, to ingest new types of data, build dashboards and alerts, and derive insights for performance tuning and incident response. Experience improving infrastructure design, automation, or troubleshooting by testing ideas, learning from results, and making thoughtful adjustments over time. Experience supporting Windows and Linux systems in an Active Directory domain, including deployment, configuration, and troubleshooting, as well as managing virtual infrastructure using platforms such as Hyper-V or VMware. Experience leveraging AI tools to accelerate task completion and improve operational efficiency. Demonstrated ability to write and troubleshoot firewall rules and quickly diagnose issues across firewalls, switches . click apply for full job details
AI Applications Engineer
InsideHigherEd Stanford, California
AI Applications Engineer Business Affairs: University IT (UIT), Redwood City, California, United States Information Technology Services Sep 08, 2025 Post Date 107213 Requisition # Job Purpose Are you an experienced AI/GenAI engineer who loves shipping real systems? Join Stanford's Enterprise Technology team to design, implement, and support AI solutions across university use cases. In this role, you will influence strategic direction, requirements, and architecture for AI driven information systems, incorporating new capabilities (LLMs, RAG, agentic frameworks, MLOps) to improve workflow, efficiency, and decision-making. You may serve as the technical lead for specific AI tracks and interrelated applications. This role blends hands-on engineering with mentorship and thought leadership. You will prototype and productionize-presenting proofs of concept, demoing solutions to stakeholders, and partnering with project managers, technical managers, architects, security, infrastructure, and application teams(ServiceNow, Salesforce, Oracle Financials, etc.) Core Duties: AI/ML System Implementation & Integration: Translate requirements into well-engineered components (pipelines, vector stores, prompt/agent logic, evaluation hooks) and implement them in partnership with the platform/architecture team. Application & Agent Development: Build and maintain LLM-based agents/services that securely call enterprise tools (ServiceNow, Salesforce, Oracle, etc.) using approved APIs and tool-calling frameworks. Create lightweight internal SDKs/utilities where needed. RAG & Search Enablement: Configure and optimize RAG workflows (chunking, embeddings, metadata filters) and integrate with existing search/vector infrastructure-escalating architecture changes to designated architects. MLOps & SDLC Practices: Follow and improve team standards for CI/CD, testing, prompt/model versioning, and observability. Own feature delivery through dev/test/prod, coordinating with release managers. Governance, Security & Compliance: Apply established guardrails (PII redaction, policy checks, access controls). Partner with InfoSec and architects to close gaps; document decisions and risks. Metrics & Reporting: Instrument services with KPIs (latency, cost, accuracy/quality) and build lightweight dashboards. Deep BI/reporting not primary. Documentation & Communication: Write clear technical docs (APIs, workflows, runbooks), user stories, and acceptance criteria. Support and sometimes lead UAT/test activities. Collaboration & Mentorship: Facilitate working sessions with stakeholders; mentor junior engineers through code reviews and pair programming; provide concise updates and risk flags. Education & Experience: Bachelor's degree and eight years of relevant experience or a combination of education and relevant experience. Required Knowledge, Skills, and Abilities: Agent/Agentic Framework Experience: Built and shipped at least one production LLM agent or agentic workflow using frameworks such as LangGraph, LangChain, CrewAI/AutoGen, Google Agent Builder/Vertex AI Agents (or equivalent). Able to explain tool selection, orchestration logic, and post deployment support. Proven Delivery: Implemented 3+ AI/ML projects and 2+ GenAI/LLM projects in production, with operational support (monitoring, tuning, incident response). Projects should serve sizable user populations and demonstrate measurable efficiency gains. Strong understanding of AI/ML concepts (LLMs/transformers and classical ML) and experience designing, developing, testing, and deploying AI-driven applications. Programming Expertise: Python (primary) plus experience with Node.js/Next.js/React/TypeScript and Java; demonstrated ability to quickly learn new tools/frameworks. Experience with cloud AI stacks (e.g., Google Vertex AI, AWS Bedrock, Azure OpenAI) and vector/search technologies (Pinecone, Elastic/OpenSearch, FAISS, Milvus, etc.). Knowledge of data design/architecture, relational and NoSQL databases, and data modeling. Thorough understanding of SDLC, MLOps, and quality control practices. Ability to define/solve logical problems for highly technical applications; strong problem-solving and systematic troubleshooting skills. Excellent communication, listening, negotiation, and conflict resolution skills; ability to bridge functional and technical resources. Desired Knowledge, Skills, and Abilities: MLOps Tooling: MLflow, Kubeflow, Vertex Pipelines, SageMaker Pipelines; LangSmith/PromptLayer/Weights & Biases. Open Source Savvy: Experience working with, customizing, and improving open-source solutions; comfortable contributing fixes/features upstream. Rapid Tech Adoption: Demonstrated ability to pick up a new technology/framework quickly and deliver production value with it. GenAI Frameworks: LangChain, LlamaIndex, DSPy, Haystack, LangGraph, Agent Engine, Google ADK, AWS AgentCore, CrewAI/AutoGen. Security & Governance: Implementing AI guardrails, red-teaming, and policy enforcement frameworks. Enterprise Integrations: ServiceNow, Salesforce, Oracle Financials, or others. UI Development: React/Next.js/Tailwind for internal tools. Prompt engineering at scale: Structured prompts (JSON/function-calling), templates, version control; automated/offline & online evals (rubrics, hallucination/bias checks, A/B tests, golden sets). Parameter efficient fine tuning (LoRA/QLoRA/adapters), supervised instruction tuning; hosting open weight models (Llama/Mistral/Qwen) with vLLM/TGI/Ollama. Safety/guardrails frameworks (Guardrails.ai, NeMo Guardrails, Azure/AWS safety filters) and jailbreak/drift detection. Hybrid search & reranking (BM25+dense, Cohere/Voyage/Jina rerankers), synthetic data generation, provenance/watermarking. Telemetry & governance: prompt/model drift monitoring, policy as code, audit logging, red teaming playbooks. Certifications and Licenses: Required: One of (or equivalent experience with): Google/AWS/Azure ML/AI certifications or strong demonstrable portfolio of production AI systems. Physical Requirements : Constantly perform desk-based computer tasks. Frequently sit, grasp lightly/fine manipulation. Occasionally stand/walk, writing by hand. Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds. Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting a contact form . Working Conditions: May work extended hours, evenings, and weekends. Work Standards: Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations. Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety; communicates safety concerns; uses and promotes safe behaviors based on training and lessons learned. Subject to and expected to stay in sync with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in Stanford's Administrative Guide, . The expected pay range for this position is $169,728 to $190,000 per annum. Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs. At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website ( ) provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process. Why Stanford is for You: Stanford University has revolutionized the way we live and enrich the world. Supporting this mission is our diverse and dedicated 17,000 staff. We seek talent driven to impact the future of our legacy. Our culture and unique perks empower you with: Freedom to grow. We offer career development programs, tuition reimbursement, or audit a course. Join a TedTalk, film screening, or listen to a renowned author or global leader speak. A caring culture. We provide superb retirement plans, generous time-off, and family care resources. A healthier you. Climb our rock wall, or choose from hundreds of health or fitness classes at our world-class exercise facilities. We also provide excellent health care benefits. Discovery and fun. Stroll through historic sculptures, trails, and museums. Enviable resources. Enjoy free commuter programs, ridesharing incentives, discounts, and more. Redwood City. Our new Stanford Redwood City campus, opened in 2019, will be the workplace for approximately 2 . click apply for full job details
01/14/2026
Full time
AI Applications Engineer Business Affairs: University IT (UIT), Redwood City, California, United States Information Technology Services Sep 08, 2025 Post Date 107213 Requisition # Job Purpose Are you an experienced AI/GenAI engineer who loves shipping real systems? Join Stanford's Enterprise Technology team to design, implement, and support AI solutions across university use cases. In this role, you will influence strategic direction, requirements, and architecture for AI driven information systems, incorporating new capabilities (LLMs, RAG, agentic frameworks, MLOps) to improve workflow, efficiency, and decision-making. You may serve as the technical lead for specific AI tracks and interrelated applications. This role blends hands-on engineering with mentorship and thought leadership. You will prototype and productionize-presenting proofs of concept, demoing solutions to stakeholders, and partnering with project managers, technical managers, architects, security, infrastructure, and application teams(ServiceNow, Salesforce, Oracle Financials, etc.) Core Duties: AI/ML System Implementation & Integration: Translate requirements into well-engineered components (pipelines, vector stores, prompt/agent logic, evaluation hooks) and implement them in partnership with the platform/architecture team. Application & Agent Development: Build and maintain LLM-based agents/services that securely call enterprise tools (ServiceNow, Salesforce, Oracle, etc.) using approved APIs and tool-calling frameworks. Create lightweight internal SDKs/utilities where needed. RAG & Search Enablement: Configure and optimize RAG workflows (chunking, embeddings, metadata filters) and integrate with existing search/vector infrastructure-escalating architecture changes to designated architects. MLOps & SDLC Practices: Follow and improve team standards for CI/CD, testing, prompt/model versioning, and observability. Own feature delivery through dev/test/prod, coordinating with release managers. Governance, Security & Compliance: Apply established guardrails (PII redaction, policy checks, access controls). Partner with InfoSec and architects to close gaps; document decisions and risks. Metrics & Reporting: Instrument services with KPIs (latency, cost, accuracy/quality) and build lightweight dashboards. Deep BI/reporting not primary. Documentation & Communication: Write clear technical docs (APIs, workflows, runbooks), user stories, and acceptance criteria. Support and sometimes lead UAT/test activities. Collaboration & Mentorship: Facilitate working sessions with stakeholders; mentor junior engineers through code reviews and pair programming; provide concise updates and risk flags. Education & Experience: Bachelor's degree and eight years of relevant experience or a combination of education and relevant experience. Required Knowledge, Skills, and Abilities: Agent/Agentic Framework Experience: Built and shipped at least one production LLM agent or agentic workflow using frameworks such as LangGraph, LangChain, CrewAI/AutoGen, Google Agent Builder/Vertex AI Agents (or equivalent). Able to explain tool selection, orchestration logic, and post deployment support. Proven Delivery: Implemented 3+ AI/ML projects and 2+ GenAI/LLM projects in production, with operational support (monitoring, tuning, incident response). Projects should serve sizable user populations and demonstrate measurable efficiency gains. Strong understanding of AI/ML concepts (LLMs/transformers and classical ML) and experience designing, developing, testing, and deploying AI-driven applications. Programming Expertise: Python (primary) plus experience with Node.js/Next.js/React/TypeScript and Java; demonstrated ability to quickly learn new tools/frameworks. Experience with cloud AI stacks (e.g., Google Vertex AI, AWS Bedrock, Azure OpenAI) and vector/search technologies (Pinecone, Elastic/OpenSearch, FAISS, Milvus, etc.). Knowledge of data design/architecture, relational and NoSQL databases, and data modeling. Thorough understanding of SDLC, MLOps, and quality control practices. Ability to define/solve logical problems for highly technical applications; strong problem-solving and systematic troubleshooting skills. Excellent communication, listening, negotiation, and conflict resolution skills; ability to bridge functional and technical resources. Desired Knowledge, Skills, and Abilities: MLOps Tooling: MLflow, Kubeflow, Vertex Pipelines, SageMaker Pipelines; LangSmith/PromptLayer/Weights & Biases. Open Source Savvy: Experience working with, customizing, and improving open-source solutions; comfortable contributing fixes/features upstream. Rapid Tech Adoption: Demonstrated ability to pick up a new technology/framework quickly and deliver production value with it. GenAI Frameworks: LangChain, LlamaIndex, DSPy, Haystack, LangGraph, Agent Engine, Google ADK, AWS AgentCore, CrewAI/AutoGen. Security & Governance: Implementing AI guardrails, red-teaming, and policy enforcement frameworks. Enterprise Integrations: ServiceNow, Salesforce, Oracle Financials, or others. UI Development: React/Next.js/Tailwind for internal tools. Prompt engineering at scale: Structured prompts (JSON/function-calling), templates, version control; automated/offline & online evals (rubrics, hallucination/bias checks, A/B tests, golden sets). Parameter efficient fine tuning (LoRA/QLoRA/adapters), supervised instruction tuning; hosting open weight models (Llama/Mistral/Qwen) with vLLM/TGI/Ollama. Safety/guardrails frameworks (Guardrails.ai, NeMo Guardrails, Azure/AWS safety filters) and jailbreak/drift detection. Hybrid search & reranking (BM25+dense, Cohere/Voyage/Jina rerankers), synthetic data generation, provenance/watermarking. Telemetry & governance: prompt/model drift monitoring, policy as code, audit logging, red teaming playbooks. Certifications and Licenses: Required: One of (or equivalent experience with): Google/AWS/Azure ML/AI certifications or strong demonstrable portfolio of production AI systems. Physical Requirements : Constantly perform desk-based computer tasks. Frequently sit, grasp lightly/fine manipulation. Occasionally stand/walk, writing by hand. Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds. Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting a contact form . Working Conditions: May work extended hours, evenings, and weekends. Work Standards: Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations. Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety; communicates safety concerns; uses and promotes safe behaviors based on training and lessons learned. Subject to and expected to stay in sync with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in Stanford's Administrative Guide, . The expected pay range for this position is $169,728 to $190,000 per annum. Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs. At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website ( ) provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process. Why Stanford is for You: Stanford University has revolutionized the way we live and enrich the world. Supporting this mission is our diverse and dedicated 17,000 staff. We seek talent driven to impact the future of our legacy. Our culture and unique perks empower you with: Freedom to grow. We offer career development programs, tuition reimbursement, or audit a course. Join a TedTalk, film screening, or listen to a renowned author or global leader speak. A caring culture. We provide superb retirement plans, generous time-off, and family care resources. A healthier you. Climb our rock wall, or choose from hundreds of health or fitness classes at our world-class exercise facilities. We also provide excellent health care benefits. Discovery and fun. Stroll through historic sculptures, trails, and museums. Enviable resources. Enjoy free commuter programs, ridesharing incentives, discounts, and more. Redwood City. Our new Stanford Redwood City campus, opened in 2019, will be the workplace for approximately 2 . click apply for full job details
DevOps Engineer
InsideHigherEd Stanford, California
DevOps Engineer Business Affairs: University IT (UIT), Redwood City, California, United States Information Technology Services Nov 17, 2025 Post Date 107201 Requisition # JOB PURPOSE Enterprise Technologies is a central IT unit at Stanford University, responsible for delivering foundational computing and communication infrastructure that supports the University's academic, research, and administrative functions. We are seeking a Senior DevOps Engineer with proven expertise in API development, as well as administration and integration experience across Google Workspace, AWS, and Microsoft Azure. In this role, you will design and automate cloud infrastructure, optimize CI/CD pipelines, develop and maintain internal APIs, and manage cloud-based configurations, identity services, and security policies-leveraging scripting and API-driven automation across multi-cloud and enterprise collaboration platforms. RESPONSIBILITIES The DevOps Engineer will serve as a key member of the team, leading the design, development, planning, support, and security of Stanford's infrastructure with a focus on Google Cloud technologies. This role is critical in optimizing software delivery pipelines, enabling collaboration across teams, and ensuring the reliability, performance, and scalability of systems running in Google Cloud. The ideal candidate will have deep experience with infrastructure automation, cloud-native tooling, and API integrations within the Google ecosystem-including GCP and Google Workspace. Strong leadership, scripting, and communication skills are essential. Key responsibilities include: CORE DUTIES: Design, implement, and maintain scalable CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI). Automate infrastructure provisioning using tools like Terraform, CloudFormation, or Pulumi. Build, deploy, and manage containerized applications using Docker and orchestration platforms (Kubernetes, ECS, etc.). Develop and maintain RESTful and/or GraphQL APIs using Python, Go, or Node.js. Manage and integrate Google Workspace with internal systems, including API access, identity provisioning (via Directory API), and service automation. Monitor system health and performance using observability tools (e.g., Prometheus, Grafana, ELK, Datadog). Implement security best practices, including IAM, secrets management, and compliance enforcement. Collaborate with engineering, security, and IT teams to maintain a secure and efficient development environment. Support incident response, root cause analysis, and disaster recovery procedures. Document infrastructure, processes, and API endpoints thoroughly. Participating in 24x7 on-call support rotation. MINIMUM REQUIREMENTS: Education & Experience Bachelor's degree and eight years of relevant experience, or a combination of education and relevant experience. Knowledge, Skills and Abilities: Bachelor's degree in Computer Science, Engineering, or equivalent work experience. 5+ years in DevOps, SRE, or Platform Engineering roles. Hands-on experience with public cloud platforms (AWS, Azure, or GCP-GCP preferred). Strong programming skills in one or more languages for API development (e.g., Python, Go, Node.js). Proficiency with CI/CD tools and IaC frameworks (e.g., Terraform, Ansible, CloudFormation). Experience with Docker and container orchestration tools such as Kubernetes or ECS. Demonstrated experience managing and integrating Google Workspace in enterprise environments, including APIs, security settings, and automation. Solid knowledge of Linux administration, networking, and cloud security principles. Strong troubleshooting and debugging skills across the stack. Excellent communication, documentation, and collaboration abilities. Should demonstrate a willingness to learn and adapt to new technologies and industry trends. Preferred Qualifications (Nice to Have): Experience building and maintaining internal developer platforms or shared infrastructure. Experience with API management platforms (e.g., Apigee, AWS API Gateway, Kong). Hands-on experience with Proofpoint administration, including email security configuration and policy management. Experience with Microsoft 365 administration, including Exchange Online, SharePoint, Teams, and security/compliance controls. Google Workspace Administrator or GCP-related certifications. Experience with Entra ID or other Identity Providers for SSO and federated authentication flows. The expected pay range for this position is $150,289 to $171,000 per annum. Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs. At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website ( ) provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process. The job duties listed are typical examples of work performed by positions in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties as assigned. Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting a contact form . Stanford is an equal employment opportunity and affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law. Additional Information Schedule: Full-time Job Code: 4833 Employee Status: Regular Grade: K Requisition ID: 107201 Work Arrangement : Hybrid Eligible
01/14/2026
Full time
DevOps Engineer Business Affairs: University IT (UIT), Redwood City, California, United States Information Technology Services Nov 17, 2025 Post Date 107201 Requisition # JOB PURPOSE Enterprise Technologies is a central IT unit at Stanford University, responsible for delivering foundational computing and communication infrastructure that supports the University's academic, research, and administrative functions. We are seeking a Senior DevOps Engineer with proven expertise in API development, as well as administration and integration experience across Google Workspace, AWS, and Microsoft Azure. In this role, you will design and automate cloud infrastructure, optimize CI/CD pipelines, develop and maintain internal APIs, and manage cloud-based configurations, identity services, and security policies-leveraging scripting and API-driven automation across multi-cloud and enterprise collaboration platforms. RESPONSIBILITIES The DevOps Engineer will serve as a key member of the team, leading the design, development, planning, support, and security of Stanford's infrastructure with a focus on Google Cloud technologies. This role is critical in optimizing software delivery pipelines, enabling collaboration across teams, and ensuring the reliability, performance, and scalability of systems running in Google Cloud. The ideal candidate will have deep experience with infrastructure automation, cloud-native tooling, and API integrations within the Google ecosystem-including GCP and Google Workspace. Strong leadership, scripting, and communication skills are essential. Key responsibilities include: CORE DUTIES: Design, implement, and maintain scalable CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI). Automate infrastructure provisioning using tools like Terraform, CloudFormation, or Pulumi. Build, deploy, and manage containerized applications using Docker and orchestration platforms (Kubernetes, ECS, etc.). Develop and maintain RESTful and/or GraphQL APIs using Python, Go, or Node.js. Manage and integrate Google Workspace with internal systems, including API access, identity provisioning (via Directory API), and service automation. Monitor system health and performance using observability tools (e.g., Prometheus, Grafana, ELK, Datadog). Implement security best practices, including IAM, secrets management, and compliance enforcement. Collaborate with engineering, security, and IT teams to maintain a secure and efficient development environment. Support incident response, root cause analysis, and disaster recovery procedures. Document infrastructure, processes, and API endpoints thoroughly. Participating in 24x7 on-call support rotation. MINIMUM REQUIREMENTS: Education & Experience Bachelor's degree and eight years of relevant experience, or a combination of education and relevant experience. Knowledge, Skills and Abilities: Bachelor's degree in Computer Science, Engineering, or equivalent work experience. 5+ years in DevOps, SRE, or Platform Engineering roles. Hands-on experience with public cloud platforms (AWS, Azure, or GCP-GCP preferred). Strong programming skills in one or more languages for API development (e.g., Python, Go, Node.js). Proficiency with CI/CD tools and IaC frameworks (e.g., Terraform, Ansible, CloudFormation). Experience with Docker and container orchestration tools such as Kubernetes or ECS. Demonstrated experience managing and integrating Google Workspace in enterprise environments, including APIs, security settings, and automation. Solid knowledge of Linux administration, networking, and cloud security principles. Strong troubleshooting and debugging skills across the stack. Excellent communication, documentation, and collaboration abilities. Should demonstrate a willingness to learn and adapt to new technologies and industry trends. Preferred Qualifications (Nice to Have): Experience building and maintaining internal developer platforms or shared infrastructure. Experience with API management platforms (e.g., Apigee, AWS API Gateway, Kong). Hands-on experience with Proofpoint administration, including email security configuration and policy management. Experience with Microsoft 365 administration, including Exchange Online, SharePoint, Teams, and security/compliance controls. Google Workspace Administrator or GCP-related certifications. Experience with Entra ID or other Identity Providers for SSO and federated authentication flows. The expected pay range for this position is $150,289 to $171,000 per annum. Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs. At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website ( ) provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process. The job duties listed are typical examples of work performed by positions in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties as assigned. Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting a contact form . Stanford is an equal employment opportunity and affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law. Additional Information Schedule: Full-time Job Code: 4833 Employee Status: Regular Grade: K Requisition ID: 107201 Work Arrangement : Hybrid Eligible
AI Data Engineer
InsideHigherEd Stanford, California
AI Data Engineer Business Affairs: University IT (UIT), Redwood City, California, United States Information Technology Services Sep 08, 2025 Post Date 107222 Requisition # Job Purpose Are you an experienced AI/GenAI engineer who loves shipping real systems with a passion for working with enterprise data? Join Stanford's Enterprise Technology team to design, implement, and support AI solutions across university use cases. In this role, you will influence strategic direction, requirements, and architecture for AI driven information systems, incorporating new capabilities (LLMs, RAG, agentic frameworks, MLOps) to improve workflow, efficiency, and decision-making. You may serve as the technical lead for specific AI tracks and interrelated applications. This role blends hands-on engineering with mentorship and thought leadership. You will prototype and productionize-presenting proofs of concept, demoing solutions to stakeholders, and partnering with project managers, technical managers, architects, security, infrastructure, and application teams (ServiceNow, Salesforce, Oracle Financials, etc.) Core Duties: AI/ML System Implementation & Integration: Translate requirements into well-engineered components (pipelines, vector stores, prompt/agent logic, evaluation hooks) and implement them in partnership with the platform/architecture team. Data Engineering & EDA: Build and optimize data ingestion, transformation, and quality pipelines. Conduct exploratory data analysis (EDA) to surface patterns, anomalies, and insights that inform AI models and decision-making. Application & Agent Development: Build and maintain LLM-based agents/services that securely call enterprise tools (ServiceNow, Salesforce, Oracle, etc.) using approved APIs and tool-calling frameworks. Create lightweight internal SDKs/utilities where needed. RAG & Search Enablement: Configure and optimize RAG workflows (chunking, embeddings, metadata filters) and integrate with existing search/vector infrastructure-escalating architecture changes to designated architects. MLOps & SDLC Practices: Follow and improve team standards for CI/CD, testing, prompt/model versioning, and observability. Own feature delivery through dev/test/prod, coordinating with release managers. Governance, Security & Compliance: Apply established guardrails (PII redaction, policy checks, access controls). Partner with InfoSec and architects to close gaps; document decisions and risks. Metrics & Reporting: Instrument services with KPIs (latency, cost, accuracy/quality) and build lightweight dashboards. Deep BI/reporting. Documentation & Communication: Write clear technical docs (APIs, workflows, runbooks), user stories, and acceptance criteria. Support and sometimes lead UAT/test activities. Collaboration & Mentorship: Facilitate working sessions with stakeholders; mentor junior engineers through code reviews and pair programming; provide concise updates and risk flags. Education & Experience: Bachelor's degree and eight years of relevant experience or a combination of education and relevant experience. Required Knowledge, Skills, and Abilities: Agent/Agentic Framework Experience: Built and shipped at least one production LLM agent or agentic workflow using frameworks such as LangGraph, LangChain, CrewAI/AutoGen, Google Agent Builder/Vertex AI Agents (or equivalent). Able to explain tool selection, orchestration logic, and post deployment support. Proven Delivery: Implemented 3+ AI/ML projects and 2+ GenAI/LLM projects in production, with operational support (monitoring, tuning, incident response). Projects should serve sizable user populations and demonstrate measurable efficiency gains. Enterprise Data Understanding: Strong knowledge of enterprise systems (ServiceNow, Salesforce, Oracle Financials, etc.) and how to extract, transform, and analyze data from them. Data Engineering & Analysis: Proficiency in building data pipelines, conducting exploratory data analysis (EDA), profiling datasets, and preparing features for ML/AI use cases. Strong understanding of AI/ML concepts (LLMs/transformers and classical ML) and experience designing, developing, testing, and deploying AI-driven applications. Programming Expertise: Python (primary), with experience in SQL and one or more general-purpose languages (Java, Node.js, or TypeScript). Experience with cloud AI stacks (e.g., Google Vertex AI, AWS Bedrock, Azure OpenAI) and vector/search technologies (Pinecone, Elastic/OpenSearch, FAISS, Milvus, etc.). Knowledge of data design/architecture, relational and NoSQL databases, and data modeling. Thorough understanding of SDLC, MLOps, and quality control practices. Ability to define/solve logical problems for highly technical applications; strong problem-solving and systematic troubleshooting skills. Excellent communication, listening, negotiation, and conflict resolution skills; ability to bridge functional and technical resources. Desired Knowledge, Skills, and Abilities: MLOps Tooling: MLflow, Kubeflow, Vertex Pipelines, SageMaker Pipelines; LangSmith/PromptLayer/Weights & Biases. Open Source Savvy: Experience working with, customizing, and improving open-source solutions; comfortable contributing fixes/features upstream. Rapid Tech Adoption: Demonstrated ability to pick up a new technology/framework quickly and deliver production value with it. GenAI Frameworks: LangChain, LlamaIndex, DSPy, Haystack, LangGraph, Agent Engine, Google ADK, AWS AgentCore, and CrewAI/AutoGen Security & Governance: Implementing AI guardrails, red-teaming, and policy enforcement frameworks. Enterprise Integrations: ServiceNow, Salesforce, Oracle Financials, or others. UI Development: React/Next.js/Tailwind for internal tools. Prompt engineering at scale: Structured prompts (JSON/function-calling), templates, version control; automated/offline & online evals (rubrics, hallucination/bias checks, A/B tests, golden sets). Parameter efficient fine tuning (LoRA/QLoRA/adapters), supervised instruction tuning; hosting open weight models (Llama/Mistral/Qwen) with vLLM/TGI/Ollama. Safety/guardrails frameworks (Guardrails.ai, NeMo Guardrails, Azure/AWS safety filters) and jailbreak/drift detection. Telemetry & governance: prompt/model drift monitoring, policy as code, audit logging, red teaming playbooks. Advanced Data Techniques: Hybrid search/reranking, synthetic data generation, provenance/watermarking, dataset drift detection. Certifications and Licenses: Required: One or more certifications in Google, AWS, or Azure AI/ML, or equivalent demonstrable portfolio of production AI/data systems. Physical Requirements : Constantly perform desk-based computer tasks. Frequently sit, grasp lightly/fine manipulation. Occasionally stand/walk, writing by hand. Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds. Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of the job. Working Conditions: May work extended hours, evenings, and weekends. Work Standards: Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations. Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety; communicates safety concerns; uses and promotes safe behaviors based on training and lessons learned. Subject to and expected to stay in sync with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in Stanford's Administrative Guide, . The expected pay range for this position is $169,728 to $190,000 per annum. Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs. At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website ( ) provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process. Why Stanford is for You: Stanford University has revolutionized the way we live and enrich the world. Supporting this mission is our diverse and dedicated 17,000 staff. We seek talent driven to impact the future of our legacy. Our culture and unique perks empower you with: Freedom to grow. We offer career development programs, tuition reimbursement, or audit a course. Join a TedTalk, film screening, or listen to a renowned author or global leader speak. A caring culture. We provide superb retirement plans, generous time-off, and family care resources. A healthier you. Climb our rock wall, or choose from hundreds of health or fitness classes at our world-class exercise facilities. We also provide excellent health care benefits. . click apply for full job details
01/14/2026
Full time
AI Data Engineer Business Affairs: University IT (UIT), Redwood City, California, United States Information Technology Services Sep 08, 2025 Post Date 107222 Requisition # Job Purpose Are you an experienced AI/GenAI engineer who loves shipping real systems with a passion for working with enterprise data? Join Stanford's Enterprise Technology team to design, implement, and support AI solutions across university use cases. In this role, you will influence strategic direction, requirements, and architecture for AI driven information systems, incorporating new capabilities (LLMs, RAG, agentic frameworks, MLOps) to improve workflow, efficiency, and decision-making. You may serve as the technical lead for specific AI tracks and interrelated applications. This role blends hands-on engineering with mentorship and thought leadership. You will prototype and productionize-presenting proofs of concept, demoing solutions to stakeholders, and partnering with project managers, technical managers, architects, security, infrastructure, and application teams (ServiceNow, Salesforce, Oracle Financials, etc.) Core Duties: AI/ML System Implementation & Integration: Translate requirements into well-engineered components (pipelines, vector stores, prompt/agent logic, evaluation hooks) and implement them in partnership with the platform/architecture team. Data Engineering & EDA: Build and optimize data ingestion, transformation, and quality pipelines. Conduct exploratory data analysis (EDA) to surface patterns, anomalies, and insights that inform AI models and decision-making. Application & Agent Development: Build and maintain LLM-based agents/services that securely call enterprise tools (ServiceNow, Salesforce, Oracle, etc.) using approved APIs and tool-calling frameworks. Create lightweight internal SDKs/utilities where needed. RAG & Search Enablement: Configure and optimize RAG workflows (chunking, embeddings, metadata filters) and integrate with existing search/vector infrastructure-escalating architecture changes to designated architects. MLOps & SDLC Practices: Follow and improve team standards for CI/CD, testing, prompt/model versioning, and observability. Own feature delivery through dev/test/prod, coordinating with release managers. Governance, Security & Compliance: Apply established guardrails (PII redaction, policy checks, access controls). Partner with InfoSec and architects to close gaps; document decisions and risks. Metrics & Reporting: Instrument services with KPIs (latency, cost, accuracy/quality) and build lightweight dashboards. Deep BI/reporting. Documentation & Communication: Write clear technical docs (APIs, workflows, runbooks), user stories, and acceptance criteria. Support and sometimes lead UAT/test activities. Collaboration & Mentorship: Facilitate working sessions with stakeholders; mentor junior engineers through code reviews and pair programming; provide concise updates and risk flags. Education & Experience: Bachelor's degree and eight years of relevant experience or a combination of education and relevant experience. Required Knowledge, Skills, and Abilities: Agent/Agentic Framework Experience: Built and shipped at least one production LLM agent or agentic workflow using frameworks such as LangGraph, LangChain, CrewAI/AutoGen, Google Agent Builder/Vertex AI Agents (or equivalent). Able to explain tool selection, orchestration logic, and post deployment support. Proven Delivery: Implemented 3+ AI/ML projects and 2+ GenAI/LLM projects in production, with operational support (monitoring, tuning, incident response). Projects should serve sizable user populations and demonstrate measurable efficiency gains. Enterprise Data Understanding: Strong knowledge of enterprise systems (ServiceNow, Salesforce, Oracle Financials, etc.) and how to extract, transform, and analyze data from them. Data Engineering & Analysis: Proficiency in building data pipelines, conducting exploratory data analysis (EDA), profiling datasets, and preparing features for ML/AI use cases. Strong understanding of AI/ML concepts (LLMs/transformers and classical ML) and experience designing, developing, testing, and deploying AI-driven applications. Programming Expertise: Python (primary), with experience in SQL and one or more general-purpose languages (Java, Node.js, or TypeScript). Experience with cloud AI stacks (e.g., Google Vertex AI, AWS Bedrock, Azure OpenAI) and vector/search technologies (Pinecone, Elastic/OpenSearch, FAISS, Milvus, etc.). Knowledge of data design/architecture, relational and NoSQL databases, and data modeling. Thorough understanding of SDLC, MLOps, and quality control practices. Ability to define/solve logical problems for highly technical applications; strong problem-solving and systematic troubleshooting skills. Excellent communication, listening, negotiation, and conflict resolution skills; ability to bridge functional and technical resources. Desired Knowledge, Skills, and Abilities: MLOps Tooling: MLflow, Kubeflow, Vertex Pipelines, SageMaker Pipelines; LangSmith/PromptLayer/Weights & Biases. Open Source Savvy: Experience working with, customizing, and improving open-source solutions; comfortable contributing fixes/features upstream. Rapid Tech Adoption: Demonstrated ability to pick up a new technology/framework quickly and deliver production value with it. GenAI Frameworks: LangChain, LlamaIndex, DSPy, Haystack, LangGraph, Agent Engine, Google ADK, AWS AgentCore, and CrewAI/AutoGen Security & Governance: Implementing AI guardrails, red-teaming, and policy enforcement frameworks. Enterprise Integrations: ServiceNow, Salesforce, Oracle Financials, or others. UI Development: React/Next.js/Tailwind for internal tools. Prompt engineering at scale: Structured prompts (JSON/function-calling), templates, version control; automated/offline & online evals (rubrics, hallucination/bias checks, A/B tests, golden sets). Parameter efficient fine tuning (LoRA/QLoRA/adapters), supervised instruction tuning; hosting open weight models (Llama/Mistral/Qwen) with vLLM/TGI/Ollama. Safety/guardrails frameworks (Guardrails.ai, NeMo Guardrails, Azure/AWS safety filters) and jailbreak/drift detection. Telemetry & governance: prompt/model drift monitoring, policy as code, audit logging, red teaming playbooks. Advanced Data Techniques: Hybrid search/reranking, synthetic data generation, provenance/watermarking, dataset drift detection. Certifications and Licenses: Required: One or more certifications in Google, AWS, or Azure AI/ML, or equivalent demonstrable portfolio of production AI/data systems. Physical Requirements : Constantly perform desk-based computer tasks. Frequently sit, grasp lightly/fine manipulation. Occasionally stand/walk, writing by hand. Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds. Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of the job. Working Conditions: May work extended hours, evenings, and weekends. Work Standards: Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations. Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety; communicates safety concerns; uses and promotes safe behaviors based on training and lessons learned. Subject to and expected to stay in sync with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in Stanford's Administrative Guide, . The expected pay range for this position is $169,728 to $190,000 per annum. Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs. At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website ( ) provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process. Why Stanford is for You: Stanford University has revolutionized the way we live and enrich the world. Supporting this mission is our diverse and dedicated 17,000 staff. We seek talent driven to impact the future of our legacy. Our culture and unique perks empower you with: Freedom to grow. We offer career development programs, tuition reimbursement, or audit a course. Join a TedTalk, film screening, or listen to a renowned author or global leader speak. A caring culture. We provide superb retirement plans, generous time-off, and family care resources. A healthier you. Climb our rock wall, or choose from hundreds of health or fitness classes at our world-class exercise facilities. We also provide excellent health care benefits. . click apply for full job details
Dir Enterprise Infrastructure
Medline Industries - Transportation & Operations Northbrook, Illinois
Job Summary The Director Enterprise Infrastructure is a senior technology leader accountable for the strategy, delivery, and continual improvement of enterprise infrastructure services and the global Service Desk. This role owns the infrastructure product portfolio (e.g., network, compute, storage, identity, endpoint, collaboration, and core platforms) and ITSM tooling (e.g., Service Management platform, knowledge, CMDB/asset, automation). The director's mandate is to improve service delivery to peer IT teams (Architecture, Security, Applications, Data, and Business Technology) and to ensure the Service Desk is modern, omnichannel, and consistently exceeding customer expectations-for both internal associates and external customers/partners. Job Description MAJOR RESPONSIBILITIES Define the multi year infrastructure services and platform strategy, aligning to enterprise objectives, budgets, risk posture, and architecture standards; translate strategy into a quarterly roadmap with measurable outcomes and published scorecards. Build and lead high performing teams across Service Desk, Infrastructure Product Management/Engineering, and ITSM Tooling & Process; develop leader bench strength and succession plans. Establish a culture of operational excellence, data driven decision making, SRE/ITIL practices, and customer empathy; champion diversity, inclusion, and talent development. Own the product lifecycle (vision, roadmaps, backlogs, SLOs/SLAs, cost models) for core infrastructure products (network, compute, storage, identity, endpoint, collaboration, platform services). Drive standardization, reliability, security, and cost efficiency. Lead a modern, omnichannel Service Desk (portal, chat/virtual agent, voice, walk up) with shift left and knowledge centered service (KCS) to maximize first contact resolution and self service adoption. Define and manage XLAs (experience level agreements) alongside SLAs to capture customer sentiment, journey friction, and outcome quality; publish transparent dashboards. Integrate Service Desk with observability/AIOps and problem management to shrink MTTR, reduce repeat incidents, and prevent recurrences Ensure an accurate, auditable CMDB/asset with service mapping that supports impact analysis, DR/runbooks, and control/compliance needs Own budgets, forecasts, and run rate transparency for infrastructure services; optimize total cost of ownership through consumption management, capacity planning, and contract negotiations. Establish outcome based vendor scorecards aligned to SLOs/XLAs and continuous improvement targets. Global Scope: Cross time zone collaboration; occasional after hours releases for infrastructure changes; periodic travel to major sites, data centers, and vendor engagements. MINIMUM JOB REQUIREMENTS Education Bachelor's in Computer Science, Information Systems, Engineering, or equivalent experience. Work Experience 5+ years in infrastructure operations or related role 3+ years managing multi-disciplinary teams Knowledge / Skills / Abilities Expertise in ITSM practices, operational excellence, and modern platform engineering (Infrastructure as code, configuration/policy as code, pipelines). Strong command of service metrics and financial management (TCO, forecasting, show back/chargeback). Customer obsessed service mindset with proven use of Experience Level Agreement (XLA) alongside SLAs; ability to translate customer journey pain points into platform and process changes. ITSM product ownership: service catalog design, request workflows, automation/orchestration, knowledge management, incident/major incident governance, problem/change excellence, and asset/CMDB health metrics. Data driven ops: build and publish operational scorecards (availability, change success rate, patch/compliance, cost per ticket, contact rate, backlog health) and OKR alignment for peer IT teams. AI for ITSM literacy and self service design to reduce live contact while elevating experience. Change leadership: org design, talent development, building a manager of managers bench, and fostering a culture of blameless post incident learning. Communication & executive presence: crisp incident/executive communications, storytelling with metrics, and stakeholder management across Architecture, Security, Applications, Data, and Business Technology. Vendor management, contract negotiation, and license optimization experience. Excellent leadership, communication, and stakeholder management across executive, technical, and frontline audiences. Metric-driven mindset and data / analytical tools such as databases and report development. Experience directing both onshore and offshore teams. Basic understanding of financial and revenue models. Strong prioritizing, interpersonal, problem-solving, project management (from conception to completion), & planning skills. Strong verbal and written communication skills. Ability to work in a fast-paced and deadline-oriented environment. Self-motivated with critical attention to detail, deadlines, and reporting. PREFERRED JOB REQUIREMENTS Education Advanced degree in Computer Science or related field Certification / Licensure Azure Fundamentals, ITIL v4 Knowledge / Skills / Abilities Experience managing million-dollar+ annual budget and forecasting activities over several years Experience operating in hybrid cloud environments and driving self service platform adoption at scale. Demonstrated success running internal platforms as products (roadmaps, backlog, service catalog, and stakeholder outcomes. Track record modernizing the Service Desk (omnichannel-phone, chat, portal, virtual agents), shift left, knowledge management; measurable gains in common service desk KPIs: first call resolution, customer satisfaction, etc. Medline Industries, LP, and its subsidiaries, offer a competitive total rewards package, continuing education & training, and tremendous potential with a growing worldwide organization. The anticipated salary range for this position: $175,760.00 - $263,640.00 Annual The actual salary will vary based on applicant's location, education, experience, skills, and abilities. This role is bonus and/or incentive eligible. Medline will not pay less than the applicable minimum wage or salary threshold. Our benefit package includes health insurance, life and disability, 401(k) contributions, paid time off, etc., for employees working 30 or more hours per week on average. For a more comprehensive list of our benefits please click here . For roles where employees work less than 30 hours per week, benefits include 401(k) contributions as well as access to the Employee Assistance Program, Employee Resource Groups and the Employee Service Corp. We're dedicated to creating a Medline where everyone feels they belong and can grow their career. We strive to do this by seeking diversity in all forms, acting inclusively, and ensuring that people have tools and resources to perform at their best. Explore our Belonging page here . Medline Industries, LP is an equal opportunity employer. Medline evaluates qualified individuals without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, age, disability, neurodivergence, protected veteran status, marital or family status, caregiver responsibilities, genetic information, or any other characteristic protected by applicable federal, state, or local laws.
01/12/2026
Full time
Job Summary The Director Enterprise Infrastructure is a senior technology leader accountable for the strategy, delivery, and continual improvement of enterprise infrastructure services and the global Service Desk. This role owns the infrastructure product portfolio (e.g., network, compute, storage, identity, endpoint, collaboration, and core platforms) and ITSM tooling (e.g., Service Management platform, knowledge, CMDB/asset, automation). The director's mandate is to improve service delivery to peer IT teams (Architecture, Security, Applications, Data, and Business Technology) and to ensure the Service Desk is modern, omnichannel, and consistently exceeding customer expectations-for both internal associates and external customers/partners. Job Description MAJOR RESPONSIBILITIES Define the multi year infrastructure services and platform strategy, aligning to enterprise objectives, budgets, risk posture, and architecture standards; translate strategy into a quarterly roadmap with measurable outcomes and published scorecards. Build and lead high performing teams across Service Desk, Infrastructure Product Management/Engineering, and ITSM Tooling & Process; develop leader bench strength and succession plans. Establish a culture of operational excellence, data driven decision making, SRE/ITIL practices, and customer empathy; champion diversity, inclusion, and talent development. Own the product lifecycle (vision, roadmaps, backlogs, SLOs/SLAs, cost models) for core infrastructure products (network, compute, storage, identity, endpoint, collaboration, platform services). Drive standardization, reliability, security, and cost efficiency. Lead a modern, omnichannel Service Desk (portal, chat/virtual agent, voice, walk up) with shift left and knowledge centered service (KCS) to maximize first contact resolution and self service adoption. Define and manage XLAs (experience level agreements) alongside SLAs to capture customer sentiment, journey friction, and outcome quality; publish transparent dashboards. Integrate Service Desk with observability/AIOps and problem management to shrink MTTR, reduce repeat incidents, and prevent recurrences Ensure an accurate, auditable CMDB/asset with service mapping that supports impact analysis, DR/runbooks, and control/compliance needs Own budgets, forecasts, and run rate transparency for infrastructure services; optimize total cost of ownership through consumption management, capacity planning, and contract negotiations. Establish outcome based vendor scorecards aligned to SLOs/XLAs and continuous improvement targets. Global Scope: Cross time zone collaboration; occasional after hours releases for infrastructure changes; periodic travel to major sites, data centers, and vendor engagements. MINIMUM JOB REQUIREMENTS Education Bachelor's in Computer Science, Information Systems, Engineering, or equivalent experience. Work Experience 5+ years in infrastructure operations or related role 3+ years managing multi-disciplinary teams Knowledge / Skills / Abilities Expertise in ITSM practices, operational excellence, and modern platform engineering (Infrastructure as code, configuration/policy as code, pipelines). Strong command of service metrics and financial management (TCO, forecasting, show back/chargeback). Customer obsessed service mindset with proven use of Experience Level Agreement (XLA) alongside SLAs; ability to translate customer journey pain points into platform and process changes. ITSM product ownership: service catalog design, request workflows, automation/orchestration, knowledge management, incident/major incident governance, problem/change excellence, and asset/CMDB health metrics. Data driven ops: build and publish operational scorecards (availability, change success rate, patch/compliance, cost per ticket, contact rate, backlog health) and OKR alignment for peer IT teams. AI for ITSM literacy and self service design to reduce live contact while elevating experience. Change leadership: org design, talent development, building a manager of managers bench, and fostering a culture of blameless post incident learning. Communication & executive presence: crisp incident/executive communications, storytelling with metrics, and stakeholder management across Architecture, Security, Applications, Data, and Business Technology. Vendor management, contract negotiation, and license optimization experience. Excellent leadership, communication, and stakeholder management across executive, technical, and frontline audiences. Metric-driven mindset and data / analytical tools such as databases and report development. Experience directing both onshore and offshore teams. Basic understanding of financial and revenue models. Strong prioritizing, interpersonal, problem-solving, project management (from conception to completion), & planning skills. Strong verbal and written communication skills. Ability to work in a fast-paced and deadline-oriented environment. Self-motivated with critical attention to detail, deadlines, and reporting. PREFERRED JOB REQUIREMENTS Education Advanced degree in Computer Science or related field Certification / Licensure Azure Fundamentals, ITIL v4 Knowledge / Skills / Abilities Experience managing million-dollar+ annual budget and forecasting activities over several years Experience operating in hybrid cloud environments and driving self service platform adoption at scale. Demonstrated success running internal platforms as products (roadmaps, backlog, service catalog, and stakeholder outcomes. Track record modernizing the Service Desk (omnichannel-phone, chat, portal, virtual agents), shift left, knowledge management; measurable gains in common service desk KPIs: first call resolution, customer satisfaction, etc. Medline Industries, LP, and its subsidiaries, offer a competitive total rewards package, continuing education & training, and tremendous potential with a growing worldwide organization. The anticipated salary range for this position: $175,760.00 - $263,640.00 Annual The actual salary will vary based on applicant's location, education, experience, skills, and abilities. This role is bonus and/or incentive eligible. Medline will not pay less than the applicable minimum wage or salary threshold. Our benefit package includes health insurance, life and disability, 401(k) contributions, paid time off, etc., for employees working 30 or more hours per week on average. For a more comprehensive list of our benefits please click here . For roles where employees work less than 30 hours per week, benefits include 401(k) contributions as well as access to the Employee Assistance Program, Employee Resource Groups and the Employee Service Corp. We're dedicated to creating a Medline where everyone feels they belong and can grow their career. We strive to do this by seeking diversity in all forms, acting inclusively, and ensuring that people have tools and resources to perform at their best. Explore our Belonging page here . Medline Industries, LP is an equal opportunity employer. Medline evaluates qualified individuals without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, age, disability, neurodivergence, protected veteran status, marital or family status, caregiver responsibilities, genetic information, or any other characteristic protected by applicable federal, state, or local laws.
AWS Observability Architect / AWS Architect
SoftPath Technologies LLC Reston, Virginia
AWS Observability Architect / AWS Architect Location: Reston, VA OR Philadelphia, PA (100% Onsite) Duration: 12 + Months contract Candidates who can work independently are more preferred Local candidates are more preferred. Role: You will be the primary AWS Architect responsible for designing the future state of our hybrid (physical hardware and AWS) observability solution to handle significant pipeline growth. Responsibilities: Serve as the main AWS Architect, translating high-level requirements into robust architectural designs for our system on AWS. Analyze existing solutions and advise on improvements to enhance scalability and robustness. Collaborate with other architects and engineers on design and implementation strategies. Top Skills Required: Expert AWS Architect Experience (required) Proven knowledge in the Observability Space (huge plus), including: Elasticsearch/OpenSearch OpenTelemetry or similar logging/tracing standards Metrics and monitoring tools
12/17/2025
AWS Observability Architect / AWS Architect Location: Reston, VA OR Philadelphia, PA (100% Onsite) Duration: 12 + Months contract Candidates who can work independently are more preferred Local candidates are more preferred. Role: You will be the primary AWS Architect responsible for designing the future state of our hybrid (physical hardware and AWS) observability solution to handle significant pipeline growth. Responsibilities: Serve as the main AWS Architect, translating high-level requirements into robust architectural designs for our system on AWS. Analyze existing solutions and advise on improvements to enhance scalability and robustness. Collaborate with other architects and engineers on design and implementation strategies. Top Skills Required: Expert AWS Architect Experience (required) Proven knowledge in the Observability Space (huge plus), including: Elasticsearch/OpenSearch OpenTelemetry or similar logging/tracing standards Metrics and monitoring tools
First Command Financial Services
AI Engineer
First Command Financial Services Fort Worth, TX, USA
The AI Engineer will design, develop, and deploy AI models and systems that support First Command’s enterprise AI strategy. This role focuses on building scalable, secure, and ethically governed AI solutions that integrate with business platforms and digital products. The engineer will collaborate with data scientists, product managers, IT developers, and data stewards to deliver high-impact AI capabilities.What will the employee do in this role?AI Solution Development Design and implement AI models and algorithms tailored to business needs. Develop and maintain machine learning pipelines and infrastructure. Translate complex business problems into AI-driven solutions using GenAI, LLMs, and predictive analytics. Model Lifecycle Management Train, test, and optimize models using frameworks like TensorFlow, PyTorch, and scikit-learn. Apply MLOps practices for model deployment, monitoring, and retraining. Ensure model observability, reproducibility, and compliance with internal controls. Data Engineering & Integration Collaborate with data stewards to ensure high-quality, governed datasets. Build data architectures and processing systems to support AI workloads. Integrate AI models into cloud-native environments (e.g., Azure OpenAI, Azure AI Foundry). Cross-Functional Collaboration Work with product managers to align AI features with customer needs. Partner with IT developers to ensure robust, production-ready implementations. Support PoC initiatives by contributing to feasibility assessments and pilot deployments. Governance & Risk Adhere to responsible AI frameworks, including model cards, risk assessments, and ethical guidelines. Collaborate with legal, compliance, and risk teams to ensure regulatory alignment. What skills & qualifications do you need? Required Bachelor’s or Master’s in Computer Science, AI, Data Science, or related field. 3+ years of experience in AI/ML engineering, including production deployment. Proficiency in Python, Java, or C++, and experience with Docker, MLflow, and cloud platforms (Azure preferred). Experience with LLMs (e.g., GPT, LLaMA), RAG pipelines, and GenAI applications. Strong problem-solving and communication skills. Preferred Experience in financial services or regulated industries. Familiarity with enterprise architecture frameworks (e.g., TOGAF, BIZBOK). Experience with cloud-native AI platforms such as Azure OpenAI, Azure AI Foundry, Microsoft Copilot, or similar. #LI-NC1 #LI-HYBRID Required Skills Required Experience  
12/15/2025
Full time
The AI Engineer will design, develop, and deploy AI models and systems that support First Command’s enterprise AI strategy. This role focuses on building scalable, secure, and ethically governed AI solutions that integrate with business platforms and digital products. The engineer will collaborate with data scientists, product managers, IT developers, and data stewards to deliver high-impact AI capabilities.What will the employee do in this role?AI Solution Development Design and implement AI models and algorithms tailored to business needs. Develop and maintain machine learning pipelines and infrastructure. Translate complex business problems into AI-driven solutions using GenAI, LLMs, and predictive analytics. Model Lifecycle Management Train, test, and optimize models using frameworks like TensorFlow, PyTorch, and scikit-learn. Apply MLOps practices for model deployment, monitoring, and retraining. Ensure model observability, reproducibility, and compliance with internal controls. Data Engineering & Integration Collaborate with data stewards to ensure high-quality, governed datasets. Build data architectures and processing systems to support AI workloads. Integrate AI models into cloud-native environments (e.g., Azure OpenAI, Azure AI Foundry). Cross-Functional Collaboration Work with product managers to align AI features with customer needs. Partner with IT developers to ensure robust, production-ready implementations. Support PoC initiatives by contributing to feasibility assessments and pilot deployments. Governance & Risk Adhere to responsible AI frameworks, including model cards, risk assessments, and ethical guidelines. Collaborate with legal, compliance, and risk teams to ensure regulatory alignment. What skills & qualifications do you need? Required Bachelor’s or Master’s in Computer Science, AI, Data Science, or related field. 3+ years of experience in AI/ML engineering, including production deployment. Proficiency in Python, Java, or C++, and experience with Docker, MLflow, and cloud platforms (Azure preferred). Experience with LLMs (e.g., GPT, LLaMA), RAG pipelines, and GenAI applications. Strong problem-solving and communication skills. Preferred Experience in financial services or regulated industries. Familiarity with enterprise architecture frameworks (e.g., TOGAF, BIZBOK). Experience with cloud-native AI platforms such as Azure OpenAI, Azure AI Foundry, Microsoft Copilot, or similar. #LI-NC1 #LI-HYBRID Required Skills Required Experience  

Modal Window

  • Home
  • Contact
  • About Us
  • FAQs
  • Terms & Conditions
  • Privacy
  • Employer
  • Post a Job
  • Search Resumes
  • Sign in
  • Job Seeker
  • Find Jobs
  • Create Resume
  • Sign in
  • IT blog
  • Facebook
  • Twitter
  • LinkedIn
  • Youtube
© 2008-2026 IT Job Board