Browse IT Jobs | IT Job Board

Senior Software Engineer, Analytics & Data Platform (fixed term for 12 months)

T. Rowe Price

Site Reliability Engineer (SRE), CDO Technology (fixed term for 12 months) page is loaded Site Reliability Engineer (SRE), CDO Technology (fixed term for 12 months) Apply locations London, Warwick Court time type Full time posted on Posted 2 Days Ago job requisition id 65225 There is a place for you at T. Rowe Price to grow, contribute, learn, and make a difference. We are a premierassetmanagerfocused on delivering global investment management excellence and retirement services that investors can rely on today and in the future. The work we do matters. We invite you to explore the opportunity to join us and grow your career with us. Summary: We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our CDO Technology Group. As an SRE, you will play a crucial role in ensuring the availability, latency, performance, efficiency, and stability of our critical infrastructure, which supports a range of data platforms, applications , and services. You will collaborate closely with development teams to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards. Responsibilities: Availability: Proactively monitor and proactively identify potential issues that could impact the availability of our systems. Implement and maintain automated alerting mechanisms to notify the appropriate parties of potential outages or performance degradation. Collaborate with development teams to design and implement solutions that enhance system resilience and reduce downtime. Latency: Analyze performance metrics to identify and resolve latency bottlenecks in our infrastructure. Implement performance optimization techniques and tools to improve the overall responsiveness of our systems. Work with development teams to ensure that new features and code changes do not introduce performance regressions. Performance: Develop and maintain metrics dashboards to track key performance indicators (KPIs) for our critical systems. Identify performance trends and anomalies that may indicate potential issues or areas for improvement. Recommend and implement performance optimization strategies to enhance the overall efficiency of our systems. Efficiency: Optimize resource utilization and minimize unnecessary expenditure on IT infrastructure. Identify and implement cost-effective solutions to improve the efficiency of our IT operations. Collaborate with development teams to optimize resource allocation for new applications and services. Release Management: Participate in the release planning process to ensure that software releases are conducted smoothly and without disruptions. Develop and implement automated deployment and rollback procedures to mitigate risks associated with software updates. Monitor the performance of new releases and address any issues that arise promptly. Monitoring: Design, implement, and maintain a comprehensive monitoring infrastructure to track the health and performance of our systems. Analyze monitoring data to identify potential issues and proactively troubleshoot problems before they impact users. Develop and implement alerts and notifications for critical events to ensure timely intervention. Emergency Response: Respond promptly to incidents and work collaboratively to resolve them in a timely manner . Analyze root causes of incidents to identify and implement preventive measures to minimize their recurrence. Document incident responses and lessons learned to enhance our incident handling processes. Participate in capacity planning exercises to anticipate future workloads and make proactive recommendations to expand or optimize infrastructure resources. Stay abreast of emerging technologies, trends, and industry best practices in the field of site reliability engineering and contribute to the continuous improvement of our practices and tools. Qualifications: Bachelor's degree in Computer Science , Information Technology, or a related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proven experience in monitoring, analyzing, and optimizing the performance of large-scale distributed systems. Expertise in Linux systems administration, including managing servers, operating systems, and network configurations. Strong scripting and automation skills, preferably with experience in Bash, Python, or similar languages. Familiarity with AWS . Experience with DevOps tools and practices, such as GitLab CI/CD, and Docker. Excellent troubleshooting and problem-solving skills with a knack for identifying and resolving complex technical issues. Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders. A passion for maintaining high availability, performance, and reliability of critical systems in a fast-paced financial environment. Other information Opportunity to work with cutting-edge technologies and contribute to the development of innovative solutions. Collaborative and supportive work environment with a focus on continuous learning and professional development. Hybrid working environment with up to 3 days a week from home. Commitment to Diversity, Equity, and Inclusion: We strive for equity, equality, and opportunity for all associates. When we embrace the power of diversity and create an environment where people can bring their authentic and best selves to work, our firm is stronger, and we create greater value for our clients. Our commitment and inclusive programming aim to lift the experience for each associate and builds allies for our global associate community. We know that a sense of belonging is key not only to your success at the firm, but also to your ability to bring your best each day. T. Rowe Price is an equal opportunity employer and values diversity of thought, gender, and race. We believe our continued success depends upon the equal treatment of all associates and applicants for employment without discrimination on the basis of race, religion, creed, colour, national origin, sex, gender, age, mental or physical disability, marital status, sexual orientation, gender identity or expression, citizenship status, military or veteran status, pregnancy, or any other classification protected by country, federal, state, or local law. Similar Jobs (4) Senior Data Engineer / Back-End Engineer (KM3/4) - Fixed Term for 12 months locations London, Warwick Court time type Full time posted on Posted 2 Days Ago Senior Software Engineer, FTA UK locations London, Warwick Court time type Full time posted on Posted 30+ Days Ago Senior Software Engineer locations London, Warwick Court time type Full time posted on Posted 30+ Days Ago T. Rowe Price is an asset management firm focused on delivering global investment management excellence and retirement services that investors can rely on-now, and over the long term.

Apr 27, 2024

Full time

Site Reliability Engineer (SRE), CDO Technology (fixed term for 12 months) page is loaded Site Reliability Engineer (SRE), CDO Technology (fixed term for 12 months) Apply locations London, Warwick Court time type Full time posted on Posted 2 Days Ago job requisition id 65225 There is a place for you at T. Rowe Price to grow, contribute, learn, and make a difference. We are a premierassetmanagerfocused on delivering global investment management excellence and retirement services that investors can rely on today and in the future. The work we do matters. We invite you to explore the opportunity to join us and grow your career with us. Summary: We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our CDO Technology Group. As an SRE, you will play a crucial role in ensuring the availability, latency, performance, efficiency, and stability of our critical infrastructure, which supports a range of data platforms, applications , and services. You will collaborate closely with development teams to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards. Responsibilities: Availability: Proactively monitor and proactively identify potential issues that could impact the availability of our systems. Implement and maintain automated alerting mechanisms to notify the appropriate parties of potential outages or performance degradation. Collaborate with development teams to design and implement solutions that enhance system resilience and reduce downtime. Latency: Analyze performance metrics to identify and resolve latency bottlenecks in our infrastructure. Implement performance optimization techniques and tools to improve the overall responsiveness of our systems. Work with development teams to ensure that new features and code changes do not introduce performance regressions. Performance: Develop and maintain metrics dashboards to track key performance indicators (KPIs) for our critical systems. Identify performance trends and anomalies that may indicate potential issues or areas for improvement. Recommend and implement performance optimization strategies to enhance the overall efficiency of our systems. Efficiency: Optimize resource utilization and minimize unnecessary expenditure on IT infrastructure. Identify and implement cost-effective solutions to improve the efficiency of our IT operations. Collaborate with development teams to optimize resource allocation for new applications and services. Release Management: Participate in the release planning process to ensure that software releases are conducted smoothly and without disruptions. Develop and implement automated deployment and rollback procedures to mitigate risks associated with software updates. Monitor the performance of new releases and address any issues that arise promptly. Monitoring: Design, implement, and maintain a comprehensive monitoring infrastructure to track the health and performance of our systems. Analyze monitoring data to identify potential issues and proactively troubleshoot problems before they impact users. Develop and implement alerts and notifications for critical events to ensure timely intervention. Emergency Response: Respond promptly to incidents and work collaboratively to resolve them in a timely manner . Analyze root causes of incidents to identify and implement preventive measures to minimize their recurrence. Document incident responses and lessons learned to enhance our incident handling processes. Participate in capacity planning exercises to anticipate future workloads and make proactive recommendations to expand or optimize infrastructure resources. Stay abreast of emerging technologies, trends, and industry best practices in the field of site reliability engineering and contribute to the continuous improvement of our practices and tools. Qualifications: Bachelor's degree in Computer Science , Information Technology, or a related field. 5+ years of experience as a Site Reliability Engineer or equivalent in a similar role. Proven experience in monitoring, analyzing, and optimizing the performance of large-scale distributed systems. Expertise in Linux systems administration, including managing servers, operating systems, and network configurations. Strong scripting and automation skills, preferably with experience in Bash, Python, or similar languages. Familiarity with AWS . Experience with DevOps tools and practices, such as GitLab CI/CD, and Docker. Excellent troubleshooting and problem-solving skills with a knack for identifying and resolving complex technical issues. Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders. A passion for maintaining high availability, performance, and reliability of critical systems in a fast-paced financial environment. Other information Opportunity to work with cutting-edge technologies and contribute to the development of innovative solutions. Collaborative and supportive work environment with a focus on continuous learning and professional development. Hybrid working environment with up to 3 days a week from home. Commitment to Diversity, Equity, and Inclusion: We strive for equity, equality, and opportunity for all associates. When we embrace the power of diversity and create an environment where people can bring their authentic and best selves to work, our firm is stronger, and we create greater value for our clients. Our commitment and inclusive programming aim to lift the experience for each associate and builds allies for our global associate community. We know that a sense of belonging is key not only to your success at the firm, but also to your ability to bring your best each day. T. Rowe Price is an equal opportunity employer and values diversity of thought, gender, and race. We believe our continued success depends upon the equal treatment of all associates and applicants for employment without discrimination on the basis of race, religion, creed, colour, national origin, sex, gender, age, mental or physical disability, marital status, sexual orientation, gender identity or expression, citizenship status, military or veteran status, pregnancy, or any other classification protected by country, federal, state, or local law. Similar Jobs (4) Senior Data Engineer / Back-End Engineer (KM3/4) - Fixed Term for 12 months locations London, Warwick Court time type Full time posted on Posted 2 Days Ago Senior Software Engineer, FTA UK locations London, Warwick Court time type Full time posted on Posted 30+ Days Ago Senior Software Engineer locations London, Warwick Court time type Full time posted on Posted 30+ Days Ago T. Rowe Price is an asset management firm focused on delivering global investment management excellence and retirement services that investors can rely on-now, and over the long term.

Azure Cloud Engineer - SRE

Akkodis City, London

Azure Site Reliability Engineer Akkodis are currently working in partnership with a leading service provider to recruit an experienced Azure Site Reliability Engineer to join a growing team of talented Cloud Engineers providing high level support and project delivery for a large customer base. Please note this is a fully remote role and you must be eligible to gain security clearance (do not need to hold currently). The Role As an Azure Site Reliability Engineer you will support the cloud infrastructure used to deliver cloud hosted managed services to customers. You will have a high customer focus being actively involved in the support and development of the service including: the resolution of support cases, live service monitoring and maintenance, new service provision and continuous improvement projects. You will provide high quality operational and technical support to customers and will be responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. The Responsibilities Provide L3/L4 analytical incident management and resolution alongside project-based deliverables Contribute to the planning of application / infrastructure releases and configuration changes Resolve support requests from customers by phone, email and online making use of the call logging system Interact with key internal stakeholders and external third-party vendors to troubleshoot and resolve complex problems Provide input to administering and maintaining all production and development environments Create detailed technical and procedural documentation (e.g. architecture, configuration and setup) Design appropriate metrics for reporting on key performance and quality indicators, particularly in terms of in-depth trend analysis Service transition and complete Operational Acceptance (OA) of new customer services Implementation and delivery of Microsoft Azure projects The Requirements Extensive experience of Microsoft Azure and its relevant build, deployment, automation, networking, and security technologies in cloud and hybrid environments. Microsoft Azure certifications: AZ-103/104 - Azure Administrator Good operational experience supporting Microsoft public cloud technologies and services at an enterprise level (multi-tenant) with in-depth knowledge of the following: Azure Active Directory (RBAC and IAM) Azure Networking Azure Storage Azure Monitor and Log Analytics Azure Security Center Demonstrable career operational experience from one of the following areas: Server Infrastructure Engineering (Virtualisation / Windows / Linux). Office / Microsoft 365 Administration. Network Engineering. DevOps (CI/CD, pipelines and Infrastructure as Code) In-depth knowledge of a scripting language (PowerShell, Bash, Azure Cli) Bright attitude and a deep desire to learn Experience with helpdesk IT Service Management Tools (e.g. BMC Remedy / Service Now). If you are looking for an exciting new challenge to join a leading cloud team team please apply now. Modis International Ltd acts as an employment agency for permanent recruitment and an employment business for the supply of temporary workers in the UK. Modis Europe Ltd provide a variety of international solutions that connect clients to the best talent in the world. For all positions based in Switzerland, Modis Europe Ltd works with its licensed Swiss partner Accurity GmbH to ensure that candidate applications are handled in accordance with Swiss law. Both Modis International Ltd and Modis Europe Ltd are Equal Opportunities Employers. By applying for this role your details will be submitted to Modis International Ltd and/ or Modis Europe Ltd. Our Candidate Privacy Information Statement which explains how we will use your information is available on the Modis website.

Apr 26, 2024

Full time

Azure Site Reliability Engineer Akkodis are currently working in partnership with a leading service provider to recruit an experienced Azure Site Reliability Engineer to join a growing team of talented Cloud Engineers providing high level support and project delivery for a large customer base. Please note this is a fully remote role and you must be eligible to gain security clearance (do not need to hold currently). The Role As an Azure Site Reliability Engineer you will support the cloud infrastructure used to deliver cloud hosted managed services to customers. You will have a high customer focus being actively involved in the support and development of the service including: the resolution of support cases, live service monitoring and maintenance, new service provision and continuous improvement projects. You will provide high quality operational and technical support to customers and will be responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. The Responsibilities Provide L3/L4 analytical incident management and resolution alongside project-based deliverables Contribute to the planning of application / infrastructure releases and configuration changes Resolve support requests from customers by phone, email and online making use of the call logging system Interact with key internal stakeholders and external third-party vendors to troubleshoot and resolve complex problems Provide input to administering and maintaining all production and development environments Create detailed technical and procedural documentation (e.g. architecture, configuration and setup) Design appropriate metrics for reporting on key performance and quality indicators, particularly in terms of in-depth trend analysis Service transition and complete Operational Acceptance (OA) of new customer services Implementation and delivery of Microsoft Azure projects The Requirements Extensive experience of Microsoft Azure and its relevant build, deployment, automation, networking, and security technologies in cloud and hybrid environments. Microsoft Azure certifications: AZ-103/104 - Azure Administrator Good operational experience supporting Microsoft public cloud technologies and services at an enterprise level (multi-tenant) with in-depth knowledge of the following: Azure Active Directory (RBAC and IAM) Azure Networking Azure Storage Azure Monitor and Log Analytics Azure Security Center Demonstrable career operational experience from one of the following areas: Server Infrastructure Engineering (Virtualisation / Windows / Linux). Office / Microsoft 365 Administration. Network Engineering. DevOps (CI/CD, pipelines and Infrastructure as Code) In-depth knowledge of a scripting language (PowerShell, Bash, Azure Cli) Bright attitude and a deep desire to learn Experience with helpdesk IT Service Management Tools (e.g. BMC Remedy / Service Now). If you are looking for an exciting new challenge to join a leading cloud team team please apply now. Modis International Ltd acts as an employment agency for permanent recruitment and an employment business for the supply of temporary workers in the UK. Modis Europe Ltd provide a variety of international solutions that connect clients to the best talent in the world. For all positions based in Switzerland, Modis Europe Ltd works with its licensed Swiss partner Accurity GmbH to ensure that candidate applications are handled in accordance with Swiss law. Both Modis International Ltd and Modis Europe Ltd are Equal Opportunities Employers. By applying for this role your details will be submitted to Modis International Ltd and/ or Modis Europe Ltd. Our Candidate Privacy Information Statement which explains how we will use your information is available on the Modis website.

SRE

Leo Technology Limited

Headline Information: Job Title: SRE / Site Reliability Engineer Technology: GCP / Google Cloud Platform, Bash / Shell Scripting, Python, PHP, CI / CD, Docker Industry: FinTech Salary: £55,000 - £75,000 Interview process: 2 Stage Working location: London - Hybrid (1 day per week on site) The Package: ? £55,000 - £75,000 Per Annum ? Hybrid and Flexible Working ? Annual Bonus ? Enhanced Pension ? Private Medical Cover ? 25 Days starting Holiday ? Cycle to work scheme ? Electric vehicle scheme The Role: As a SRE / Site Reliability Engineer you will be part of a 8+ Engineer team working from their London office. You will be responsible for making sure the company's internal system is completely automated and runs without issue. An ability to write well designed code, help make use of automation alongside strong experience working with GCP / Google Cloud Platform will be essential. Requirements for the SRE: Site Reliability Engineer / SRE experience Google Cloud Platform / GCP Bash / Shell Scripting Python or PHP SQL Docker CI / CD (Continuous Improvement / Continuous Deployment) Linux Working Set-Up: The SRE team work in the companies London office one day per week. Interview Process: 1 Introductory conversation with Hiring Manager (video or telephone) 2 In-depth technical chat with the HM and a team member. You may be asked to run the team through a examples of previous work (video) Important Notice: This position is unfortunately unable to provide sponsorship. Leo Technology Limited is acting as an Employment Agency in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Leo Technology website.

Apr 25, 2024

Full time

Headline Information: Job Title: SRE / Site Reliability Engineer Technology: GCP / Google Cloud Platform, Bash / Shell Scripting, Python, PHP, CI / CD, Docker Industry: FinTech Salary: £55,000 - £75,000 Interview process: 2 Stage Working location: London - Hybrid (1 day per week on site) The Package: ? £55,000 - £75,000 Per Annum ? Hybrid and Flexible Working ? Annual Bonus ? Enhanced Pension ? Private Medical Cover ? 25 Days starting Holiday ? Cycle to work scheme ? Electric vehicle scheme The Role: As a SRE / Site Reliability Engineer you will be part of a 8+ Engineer team working from their London office. You will be responsible for making sure the company's internal system is completely automated and runs without issue. An ability to write well designed code, help make use of automation alongside strong experience working with GCP / Google Cloud Platform will be essential. Requirements for the SRE: Site Reliability Engineer / SRE experience Google Cloud Platform / GCP Bash / Shell Scripting Python or PHP SQL Docker CI / CD (Continuous Improvement / Continuous Deployment) Linux Working Set-Up: The SRE team work in the companies London office one day per week. Interview Process: 1 Introductory conversation with Hiring Manager (video or telephone) 2 In-depth technical chat with the HM and a team member. You may be asked to run the team through a examples of previous work (video) Important Notice: This position is unfortunately unable to provide sponsorship. Leo Technology Limited is acting as an Employment Agency in relation to this vacancy. To understand more about what we do with your data please review our privacy policy in the privacy section of the Leo Technology website.

SRE / Cloud Administrator - MOD DV or SC

Sanderson Tewkesbury, Gloucestershire

Site Reliability Engineer / Cloud Administrator - SC or MOD DV Location : Tewkesbury Salary: £55,000 - £75,000 Clearance: Active MOD DV Preferable, alternatively an active SC. Type: Full time on-site A leading provider of innovative research, data, machine learning and infrastructure solutions to secure UK Defence customers are looking to add to their team. They are global leaders in Internet facing systems and the innovative application of machine intelligence to the complex problems facing their secure customers. They are looking to bring in a Cloud Centric SME to add value to a project in the build phase and solve complex problems. The role: Design, deploy, and manage resilient and scalable infrastructure solutions using cloud technologies. Automate manual tasks and workflows to enhance operational efficiency and agility in cloud environments. Establish and manage comprehensive monitoring and alerting mechanisms to uphold system reliability, performance, and security in the cloud. Perform assessments and root cause analyses to proactively prevent recurrence of cloud-related incidents. Collaborate closely with diverse teams across the organisation to fine-tune application performance and bolster reliability in cloud environments. Participate in rotational on-call duties, promptly addressing and resolving cloud-related incidents to ensure uninterrupted service delivery. Technical Skills: AWS Azure Kubernetes Linux Ansible Security The role comes with an existing team of talented engineers to work alongside with extensive scope to learn new technologies and develop. If you're interested in the above and would like to learn more, apply or reach out to

Apr 25, 2024

Full time

Site Reliability Engineer / Cloud Administrator - SC or MOD DV Location : Tewkesbury Salary: £55,000 - £75,000 Clearance: Active MOD DV Preferable, alternatively an active SC. Type: Full time on-site A leading provider of innovative research, data, machine learning and infrastructure solutions to secure UK Defence customers are looking to add to their team. They are global leaders in Internet facing systems and the innovative application of machine intelligence to the complex problems facing their secure customers. They are looking to bring in a Cloud Centric SME to add value to a project in the build phase and solve complex problems. The role: Design, deploy, and manage resilient and scalable infrastructure solutions using cloud technologies. Automate manual tasks and workflows to enhance operational efficiency and agility in cloud environments. Establish and manage comprehensive monitoring and alerting mechanisms to uphold system reliability, performance, and security in the cloud. Perform assessments and root cause analyses to proactively prevent recurrence of cloud-related incidents. Collaborate closely with diverse teams across the organisation to fine-tune application performance and bolster reliability in cloud environments. Participate in rotational on-call duties, promptly addressing and resolving cloud-related incidents to ensure uninterrupted service delivery. Technical Skills: AWS Azure Kubernetes Linux Ansible Security The role comes with an existing team of talented engineers to work alongside with extensive scope to learn new technologies and develop. If you're interested in the above and would like to learn more, apply or reach out to

SRE / Infrastructure Administrator - SC OR MOD DV

Sanderson Tewkesbury, Gloucestershire

Site Reliability Engineer / Infrastructure Administrator - SC or MOD DV Location : Tewkesbury Salary: £55,000 - £75,000 Clearance: Active MOD DV Preferable, alternatively an active SC. A leading provider of innovative research, data, machine learning and infrastructure solutions to secure UK Defence customers are looking to add to their team. They are global leaders in Internet facing systems and the innovative application of machine intelligence to the complex problems facing their secure customers. They are looking to bring in an infrastructure centric SRE to maintain an existing system and add value across the project lifecyle. The role: Design, implement, and maintain robust and scalable infrastructure solutions. Automate manual processes to streamline operations and improve efficiency. Develop and maintain monitoring and alerting systems to ensure system reliability and performance. Conduct post-incident reviews and root cause analysis to prevent future occurrences. Work closely with cross-functional teams to optimise application performance and reliability. Participate in on-call rotations and respond to incidents in a timely manner. Technical Skills: RedHat Linux OpenShift Kubernetes IaaC Ansible Terraform The role comes with an existing team of talented engineers to work alongside with extensive scope to learn new technologies and develop. If you're interested in the above and would like to learn more, apply or reach out to

Apr 25, 2024

Full time

Site Reliability Engineer / Infrastructure Administrator - SC or MOD DV Location : Tewkesbury Salary: £55,000 - £75,000 Clearance: Active MOD DV Preferable, alternatively an active SC. A leading provider of innovative research, data, machine learning and infrastructure solutions to secure UK Defence customers are looking to add to their team. They are global leaders in Internet facing systems and the innovative application of machine intelligence to the complex problems facing their secure customers. They are looking to bring in an infrastructure centric SRE to maintain an existing system and add value across the project lifecyle. The role: Design, implement, and maintain robust and scalable infrastructure solutions. Automate manual processes to streamline operations and improve efficiency. Develop and maintain monitoring and alerting systems to ensure system reliability and performance. Conduct post-incident reviews and root cause analysis to prevent future occurrences. Work closely with cross-functional teams to optimise application performance and reliability. Participate in on-call rotations and respond to incidents in a timely manner. Technical Skills: RedHat Linux OpenShift Kubernetes IaaC Ansible Terraform The role comes with an existing team of talented engineers to work alongside with extensive scope to learn new technologies and develop. If you're interested in the above and would like to learn more, apply or reach out to

Cloud Infrastructure SRE - MOD DV

Sanderson Tewkesbury, Gloucestershire

Cloud Infrastructure SRE - MOD DV Location : Tewkesbury Salary: £55,000 - £75,000 Clearance: Active MOD DV Preferable, alternatively an active SC. Type: 5 days on-site A leading provider of innovative research, data, machine learning and infrastructure solutions to secure UK Defence customers are looking to add to their team. They are global leaders in Internet facing systems and the innovative application of machine intelligence to the complex problems facing their secure customers. They are looking to bring in an infrastructure centric SRE to maintain an existing system and add value across the project lifecyle. The role: Design, implement, and maintain robust and scalable infrastructure solutions. Automate manual processes to streamline operations and improve efficiency. Develop and maintain monitoring and alerting systems to ensure system reliability and performance. Conduct post-incident reviews and root cause analysis to prevent future occurrences. Work closely with cross-functional teams to optimise application performance and reliability. Participate in on-call rotations and respond to incidents in a timely manner. Technical Skills: RedHat Linux OpenShift Kubernetes IaaC Ansible Terraform The role comes with an existing team of talented engineers to work alongside with extensive scope to learn new technologies and develop. If you're interested in the above and would like to learn more, apply or reach out to

Apr 25, 2024

Full time

Cloud Infrastructure SRE - MOD DV Location : Tewkesbury Salary: £55,000 - £75,000 Clearance: Active MOD DV Preferable, alternatively an active SC. Type: 5 days on-site A leading provider of innovative research, data, machine learning and infrastructure solutions to secure UK Defence customers are looking to add to their team. They are global leaders in Internet facing systems and the innovative application of machine intelligence to the complex problems facing their secure customers. They are looking to bring in an infrastructure centric SRE to maintain an existing system and add value across the project lifecyle. The role: Design, implement, and maintain robust and scalable infrastructure solutions. Automate manual processes to streamline operations and improve efficiency. Develop and maintain monitoring and alerting systems to ensure system reliability and performance. Conduct post-incident reviews and root cause analysis to prevent future occurrences. Work closely with cross-functional teams to optimise application performance and reliability. Participate in on-call rotations and respond to incidents in a timely manner. Technical Skills: RedHat Linux OpenShift Kubernetes IaaC Ansible Terraform The role comes with an existing team of talented engineers to work alongside with extensive scope to learn new technologies and develop. If you're interested in the above and would like to learn more, apply or reach out to

Python Developer

Inspire People

Join a team at the heart of the global economy! The Department for Business and Trade ("DBT") and Inspire People are partnering together to bring you an exciting opportunity for an experienced Python Developer to support essential tooling and systems across DBT. This role is ideal for a Back End Python developer looking for career growth and be exposed to cloud native systems with an SRE touch to join a team that ensures DBT's digital services work as users expect, working with development teams giving them the tools for their job, including application performance monitoring, exception, log and metrics aggregation, dashboards, and declarative CD/CI pipelines. £55,400 to £74,600 (including allowances) plus excellent Civil Service benefits and pension. Salary is dependent on location and technical skills as assessed at interview. Flexible, hybrid working from London, Cardiff, Darlington, Edinburgh, Belfast, Cardiff, Birmingham or Salford. DBTs Digital, Data and Technology (DDaT) team develops and operates tools, services, and platforms that enable the UK government to provide world leading support to businesses in the UK and overseas. As a senior SRE developing Python, you will work to give development teams the tools for their job, including application performance monitoring, exception, log and metrics aggregation, dashboards, and declarative CI/CD (continuous integration/continuous delivery) pipelines. You'll evangelise product teams about service-level indicators, objectives, and error budgets, and negotiate them. You'll help build and scale our global product platform and participate in an on-call rota. The Tech Stack includes: Python and Django framework Serverless compute (Lambda) Amazon Web Services Azure Jenkins and AWS Codepipelines Terraform & CloudFormation Kubernetes Elastic Container Service (ECS) Elasticsearch PostgreSQL Sentry Redis Essential Skills and Experience You should be able to demonstrate: Experience and fluency in Python, writing clean and effective code. Cloud experience with either Amazon Web Services, Azure or Google Cloud. Ability to build code-defined, reliable, and well tested infrastructure on top of cloud computing systems (eg, Terraform, CloudFormation, Pulumi). Experience in designing, analysing, and troubleshooting distributed systems. Knowledge of Linux/Unix fundamentals and TCP/IP Networking. Ability to see user impact in the infrastructure changes. Desirable Skills and Experience While not essential, it would be ideal if you have demonstrable skills and experience of: Experience coding infrastructure (ie, Terraform, CloudFormation). Experience in defining and measuring Service Level Objectives. Experience in observability driven development. Experience in prototyping through reuse of existing Open Source components. In return, you can expect a planned, transparent progression with learning and development tailored to your role, an environment with flexible working options and a culture encouraging inclusion and diversity, plus the following benefits: Salary of £54,400 to £74,600 (including allowances) including annual allowance depending on location and experience Flexible, hybrid working from London, Cardiff, Darlington, Edinburgh, Belfast, Birmingham, Salford Annual leave starting at 26 days per annum plus statutory bank holidays rising to 33 days with service An excellent Civil Service pension scheme. If you are a Python Developer, DevOps Engineer, Site Reliability Engineer or Systems Administrator looking to enhance your career and make a difference across an expanding function, then apply today or contact Alison Whitehead at Inspire People in complete confidence for further information. Further Information: This role requires SC clearance, a condition of which is to have been present in the UK for 3 out of the past 5 years.

Aug 14, 2023

Full time

Join a team at the heart of the global economy! The Department for Business and Trade ("DBT") and Inspire People are partnering together to bring you an exciting opportunity for an experienced Python Developer to support essential tooling and systems across DBT. This role is ideal for a Back End Python developer looking for career growth and be exposed to cloud native systems with an SRE touch to join a team that ensures DBT's digital services work as users expect, working with development teams giving them the tools for their job, including application performance monitoring, exception, log and metrics aggregation, dashboards, and declarative CD/CI pipelines. £55,400 to £74,600 (including allowances) plus excellent Civil Service benefits and pension. Salary is dependent on location and technical skills as assessed at interview. Flexible, hybrid working from London, Cardiff, Darlington, Edinburgh, Belfast, Cardiff, Birmingham or Salford. DBTs Digital, Data and Technology (DDaT) team develops and operates tools, services, and platforms that enable the UK government to provide world leading support to businesses in the UK and overseas. As a senior SRE developing Python, you will work to give development teams the tools for their job, including application performance monitoring, exception, log and metrics aggregation, dashboards, and declarative CI/CD (continuous integration/continuous delivery) pipelines. You'll evangelise product teams about service-level indicators, objectives, and error budgets, and negotiate them. You'll help build and scale our global product platform and participate in an on-call rota. The Tech Stack includes: Python and Django framework Serverless compute (Lambda) Amazon Web Services Azure Jenkins and AWS Codepipelines Terraform & CloudFormation Kubernetes Elastic Container Service (ECS) Elasticsearch PostgreSQL Sentry Redis Essential Skills and Experience You should be able to demonstrate: Experience and fluency in Python, writing clean and effective code. Cloud experience with either Amazon Web Services, Azure or Google Cloud. Ability to build code-defined, reliable, and well tested infrastructure on top of cloud computing systems (eg, Terraform, CloudFormation, Pulumi). Experience in designing, analysing, and troubleshooting distributed systems. Knowledge of Linux/Unix fundamentals and TCP/IP Networking. Ability to see user impact in the infrastructure changes. Desirable Skills and Experience While not essential, it would be ideal if you have demonstrable skills and experience of: Experience coding infrastructure (ie, Terraform, CloudFormation). Experience in defining and measuring Service Level Objectives. Experience in observability driven development. Experience in prototyping through reuse of existing Open Source components. In return, you can expect a planned, transparent progression with learning and development tailored to your role, an environment with flexible working options and a culture encouraging inclusion and diversity, plus the following benefits: Salary of £54,400 to £74,600 (including allowances) including annual allowance depending on location and experience Flexible, hybrid working from London, Cardiff, Darlington, Edinburgh, Belfast, Birmingham, Salford Annual leave starting at 26 days per annum plus statutory bank holidays rising to 33 days with service An excellent Civil Service pension scheme. If you are a Python Developer, DevOps Engineer, Site Reliability Engineer or Systems Administrator looking to enhance your career and make a difference across an expanding function, then apply today or contact Alison Whitehead at Inspire People in complete confidence for further information. Further Information: This role requires SC clearance, a condition of which is to have been present in the UK for 3 out of the past 5 years.

Site Reliability Engineer

Project Recruit

Site Reliability Engineer Our client, a leading global supplier for IT services requires a Site Reliability Engineer- Virtualisation SME based at their client's offices in London . You may be able to work some days remotely. This is a 1 year temporary contract to start ASAP. Day rate: Competitive market rate We are looking for a Site Reliability Engineer - Virtualisation SME with 10+ years of experience having excellent knowledge of ESX VMWare and/or Nutanix HCI and of container orchestration platforms such as Docker and Kubernetes: Key Responsibilities Responsible for the reliability and efficiency of virtualisation infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil the OS and DB Platform Operations team must perform Responsible for writing software to make the virtualisation infrastructure self-managing and self-service. Responsible for automation and continuous service improvement by developing Infrastructure as Code. Responsible for elimination of manual, repetitive, automatable, tactical tasks that are devoid from value. Responsible for availability, latency, performance, efficiency, change management, monitoring and capacity planning. Responsible for improving system performance, making effective use of resources, distributing load and reducing latency. Responsible for identifying SLO's (Service Level Objectives) that align the team to meet availability and latency objectives. Responsible for developing pro-active monitoring solutions that alert on symptoms and not just on outages. Responsible for performing detailed root cause analysis (RCA's) on incidents and outages to prevent future occurrence. Responsible for partnering with development teams to improve services via rigorous testing and release procedures. Responsible for actively sharing knowledge and best practices across the organisation. Responsible for identifying technical debt and partner with application teams to build remediation plans. Responsible for developing standard operational procedures and producing effective documentation. Responsible for analysing workloads and devising suitable cloud migration strategies where appropriate. Responsible for participating in on-call rotation, triaging and addressing production issues as they arise. Responsible for performing the OS Platform Operations function as and when required. Responsible for mentoring and developing less experienced SA's and SRE's. Responsible for identifying cost saving and optimisation opportunities within the customer business. Responsible for building strong relationships across the customer functions and business areas, underpinned by trust and the core values of the customer. Key Skills Essential: Excellent knowledge of ESX VMWare and/or Nutanix HCI. Excellent knowledge of Windows Server 2008/2012/2016/2019. Excellent knowledge of Windows OS tuning utilities and commands. Excellent knowledge of configuring Windows OS systems for optimal performance. Excellent knowledge of Windows clustering and high-availability solutions. Excellent knowledge of Microsoft Active Directory, LDAP and Kerberos. Excellent knowledge of TCP/IP Networking Protocols. Excellent knowledge of networking, storage, database and virtualization layers. Excellent knowledge of container orchestration platforms such as Docker and Kubernetes. Excellent knowledge of version control software such as GitHub and Subversion. Excellent knowledge of configuration management software such as Chef, Puppet, Ansible, Terraform and SaltStack. Excellent knowledge of "Infrastructure as Code" principles and practices. Excellent knowledge of continuous integration (CI) and continuous development (CD) principles and practices. Excellent knowledge of applications development using Agile, and DevOps best practices. Excellent knowledge of operating system security and auditing methods. Excellent knowledge of security hardening principles in line with CIS industry benchmarks. Excellent knowledge of data security governance and regulations such as GDPR and SOX. Excellent knowledge of cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle. Desirable: Good working knowledge of RedHat Enterprise Linux (6.x, 7.x, 8.x) and Solaris (10.x and 11.x). Good working knowledge of Unix/Linux OS tuning utilities and commands. Good working knowledge of Unix/Linux system internals and Kernel tuning for optimal performance. Good working knowledge of Red Hat Satellite. Good working knowledge of Anti-Virus software such as McAfee and Sophos. Good working knowledge of Ivanti LANDESK and Symantec Altiris. Good working knowledge of ThinPrint and EquiTrack (Follow-Me Printing). Good working knowledge of Rubrik. Good working knowledge of EMC, HDS and Pure storage arrays. Good working knowledge of Dell PowerEdge, IBM xSeries and Cisco UCS hardware. Good working knowledge of EMC Networker, Data Domain and IBM Tivoli Storage Manager. Good working knowledge of Infoblox DNS. Good working knowledge of Icinga 2 and OpManager. Good working knowledge of IBM Tivoli and Netcool. Good working knowledge of GitHub, Subversion and TeamCity. Good working knowledge of BMC Control-M. Good working knowledge of CyberArk. Good working knowledge of Splunk and IBM QRadar. Good working knowledge of Qualys. Good working knowledge of SharePoint, JIRA and Confluence. Good working knowledge of ServiceNow and Serena Business Manager. Candidate Specifications Excellent communication and interpersonal skills Ability to handle pressure during outages and systematically resolve issues Excellent problem-solving skills Results driven, with a strong sense of accountability A proactive, motivated approach The ability to operate with urgency and prioritise work accordingly A structured and logical approach to work Attention to detail and accuracy Ability to perform well in a pressurised environment Ability to manage constructive conflict effectively The ability to manage large workloads and tight deadlines Able to communicate complex technical concepts to non-technical persons at all levels

Sep 23, 2022

Contractor

Site Reliability Engineer Our client, a leading global supplier for IT services requires a Site Reliability Engineer- Virtualisation SME based at their client's offices in London . You may be able to work some days remotely. This is a 1 year temporary contract to start ASAP. Day rate: Competitive market rate We are looking for a Site Reliability Engineer - Virtualisation SME with 10+ years of experience having excellent knowledge of ESX VMWare and/or Nutanix HCI and of container orchestration platforms such as Docker and Kubernetes: Key Responsibilities Responsible for the reliability and efficiency of virtualisation infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil the OS and DB Platform Operations team must perform Responsible for writing software to make the virtualisation infrastructure self-managing and self-service. Responsible for automation and continuous service improvement by developing Infrastructure as Code. Responsible for elimination of manual, repetitive, automatable, tactical tasks that are devoid from value. Responsible for availability, latency, performance, efficiency, change management, monitoring and capacity planning. Responsible for improving system performance, making effective use of resources, distributing load and reducing latency. Responsible for identifying SLO's (Service Level Objectives) that align the team to meet availability and latency objectives. Responsible for developing pro-active monitoring solutions that alert on symptoms and not just on outages. Responsible for performing detailed root cause analysis (RCA's) on incidents and outages to prevent future occurrence. Responsible for partnering with development teams to improve services via rigorous testing and release procedures. Responsible for actively sharing knowledge and best practices across the organisation. Responsible for identifying technical debt and partner with application teams to build remediation plans. Responsible for developing standard operational procedures and producing effective documentation. Responsible for analysing workloads and devising suitable cloud migration strategies where appropriate. Responsible for participating in on-call rotation, triaging and addressing production issues as they arise. Responsible for performing the OS Platform Operations function as and when required. Responsible for mentoring and developing less experienced SA's and SRE's. Responsible for identifying cost saving and optimisation opportunities within the customer business. Responsible for building strong relationships across the customer functions and business areas, underpinned by trust and the core values of the customer. Key Skills Essential: Excellent knowledge of ESX VMWare and/or Nutanix HCI. Excellent knowledge of Windows Server 2008/2012/2016/2019. Excellent knowledge of Windows OS tuning utilities and commands. Excellent knowledge of configuring Windows OS systems for optimal performance. Excellent knowledge of Windows clustering and high-availability solutions. Excellent knowledge of Microsoft Active Directory, LDAP and Kerberos. Excellent knowledge of TCP/IP Networking Protocols. Excellent knowledge of networking, storage, database and virtualization layers. Excellent knowledge of container orchestration platforms such as Docker and Kubernetes. Excellent knowledge of version control software such as GitHub and Subversion. Excellent knowledge of configuration management software such as Chef, Puppet, Ansible, Terraform and SaltStack. Excellent knowledge of "Infrastructure as Code" principles and practices. Excellent knowledge of continuous integration (CI) and continuous development (CD) principles and practices. Excellent knowledge of applications development using Agile, and DevOps best practices. Excellent knowledge of operating system security and auditing methods. Excellent knowledge of security hardening principles in line with CIS industry benchmarks. Excellent knowledge of data security governance and regulations such as GDPR and SOX. Excellent knowledge of cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle. Desirable: Good working knowledge of RedHat Enterprise Linux (6.x, 7.x, 8.x) and Solaris (10.x and 11.x). Good working knowledge of Unix/Linux OS tuning utilities and commands. Good working knowledge of Unix/Linux system internals and Kernel tuning for optimal performance. Good working knowledge of Red Hat Satellite. Good working knowledge of Anti-Virus software such as McAfee and Sophos. Good working knowledge of Ivanti LANDESK and Symantec Altiris. Good working knowledge of ThinPrint and EquiTrack (Follow-Me Printing). Good working knowledge of Rubrik. Good working knowledge of EMC, HDS and Pure storage arrays. Good working knowledge of Dell PowerEdge, IBM xSeries and Cisco UCS hardware. Good working knowledge of EMC Networker, Data Domain and IBM Tivoli Storage Manager. Good working knowledge of Infoblox DNS. Good working knowledge of Icinga 2 and OpManager. Good working knowledge of IBM Tivoli and Netcool. Good working knowledge of GitHub, Subversion and TeamCity. Good working knowledge of BMC Control-M. Good working knowledge of CyberArk. Good working knowledge of Splunk and IBM QRadar. Good working knowledge of Qualys. Good working knowledge of SharePoint, JIRA and Confluence. Good working knowledge of ServiceNow and Serena Business Manager. Candidate Specifications Excellent communication and interpersonal skills Ability to handle pressure during outages and systematically resolve issues Excellent problem-solving skills Results driven, with a strong sense of accountability A proactive, motivated approach The ability to operate with urgency and prioritise work accordingly A structured and logical approach to work Attention to detail and accuracy Ability to perform well in a pressurised environment Ability to manage constructive conflict effectively The ability to manage large workloads and tight deadlines Able to communicate complex technical concepts to non-technical persons at all levels

Production Engineer

Meta

Production Engineers at Meta are hybrid software/systems engineers who ensure that Meta's services run smoothly and have the capacity for future growth. They are embedded in every one of Facebook's product and infrastructure teams, and are core participants in every significant engineering effort underway in the company.Our team is comprised of varying levels of experience and backgrounds, from new grads to industry veterans. Relevant industry experience is important (Site Reliability Engineer (SRE), Systems Engineer, Software Engineer, DevOps Engineer, Network Engineer, Systems Administrator, Linux Administrator, Database Administrator or similar role), but ultimately less so than your demonstrated abilities and attitude. We sail into uncharted waters every day at Meta in Production Engineering, and we are always learning. Production Engineer Responsibilities: Own back-end services like our Hadoop data warehouses, front-end services like Chat and Newsfeed, infrastructure components like our Memcache infrastructure, and everything in between Write and review code, develop documentation and capacity plans, and debug the hardest problems, live, on some of the largest and most complex systems in the world Together with your engineering team, you will share an on-call rotation Partnered alongside the best engineers in the industry on the coolest stuff around, the code and systems you work on will be in production and used by millions of users all around the world Minimum Qualifications: Engineering degree, or a related technical discipline, or equivalent work experience Engineering degree, or a related technical discipline, or equivalent work experience Experience coding in higher-level languages (e.g., PHP, Python, C++, or Java) Experience in configuration and maintenance of applications such as web servers, load balancers, relational databases, storage systems and messaging systems Experience learning software, frameworks and APIs Preferred Qualifications: BS or MS in Computer Science

Sep 22, 2022

Full time

Production Engineers at Meta are hybrid software/systems engineers who ensure that Meta's services run smoothly and have the capacity for future growth. They are embedded in every one of Facebook's product and infrastructure teams, and are core participants in every significant engineering effort underway in the company.Our team is comprised of varying levels of experience and backgrounds, from new grads to industry veterans. Relevant industry experience is important (Site Reliability Engineer (SRE), Systems Engineer, Software Engineer, DevOps Engineer, Network Engineer, Systems Administrator, Linux Administrator, Database Administrator or similar role), but ultimately less so than your demonstrated abilities and attitude. We sail into uncharted waters every day at Meta in Production Engineering, and we are always learning. Production Engineer Responsibilities: Own back-end services like our Hadoop data warehouses, front-end services like Chat and Newsfeed, infrastructure components like our Memcache infrastructure, and everything in between Write and review code, develop documentation and capacity plans, and debug the hardest problems, live, on some of the largest and most complex systems in the world Together with your engineering team, you will share an on-call rotation Partnered alongside the best engineers in the industry on the coolest stuff around, the code and systems you work on will be in production and used by millions of users all around the world Minimum Qualifications: Engineering degree, or a related technical discipline, or equivalent work experience Engineering degree, or a related technical discipline, or equivalent work experience Experience coding in higher-level languages (e.g., PHP, Python, C++, or Java) Experience in configuration and maintenance of applications such as web servers, load balancers, relational databases, storage systems and messaging systems Experience learning software, frameworks and APIs Preferred Qualifications: BS or MS in Computer Science

Senior AWS SRE

Hays Specialist Recruitment Limited

Fully Remote| Senior AWS Site Reliability engineer| £80k-£100k Your new company Created in 2012, this exciting London based Fintech Company has a unique value proposition of enabling and empowering financial institutions to take their problem-solving abilities to the next level, further aligning with the rapid development of technology within businesses. Not only will you be working in the Heart of the City of London, but you will also be part of a rapidly growing Fintech company that has been ranked within the top 100 influential Companies to work alongside financial institutions. Your new role You will be a crucial senior engineer working on mission critical functions within the operations team. You must have an immense passion for technology and automation. A resilient approach and a problem-solving mind when dealing with complex problems are all essential. You will be working alongside seasoned professionals in a high-intensity team environment. Your day to day will be diverse and consists of: Designing and innovating and developing a wide range of key systems including improving the CI/CD Process. Monitoring and maintaining a 100% cloud environment (AWS) Create, secure, reliable, repeatable production rollouts. Communicating ideas and decisions throughout the team Ensuring that tasks are owned and fully completed with high quality Working with multiple business units to gather requirements and collaboratively build solutions Ongoing maintenance and upgrades on key components of the platform What you'll need to succeed Excellent knowledge and practical skills regarding AWS (Monitoring and maintaining) Excellent Linux/ Unix Administration skills- requires networking and scripting Experience using one or more of the following Ansible Puppet Chef Salt Excellent teamwork and strong Communication skills. Understanding AWS, Google Cloud, Microsoft Azure or one of the other IaaS providers (Linode, Digital Ocean, OpenStack, VMWare, XEN) and using Terraform, CloudFormation, boto or other orchestration tools. Intrinsic interest and experience with development and scripting languages as Python, JavaScript, Java, C++, Bash Experience with software such as Git What you'll get in return You will be working an emerging fintech company whose state-of-the-art offices are in the heart of London. You will also be entitled to flexible working including a 100% remote if you would like it. An extremely competitive salary as well as other benefits such as subsides healthcare, dental, pension, 25 + days annual leave and a very competitive salary. What you need to do now Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C's, Privacy Policy and Disclaimers which can be found at hays.co.uk

Nov 04, 2021

Full time

Fully Remote| Senior AWS Site Reliability engineer| £80k-£100k Your new company Created in 2012, this exciting London based Fintech Company has a unique value proposition of enabling and empowering financial institutions to take their problem-solving abilities to the next level, further aligning with the rapid development of technology within businesses. Not only will you be working in the Heart of the City of London, but you will also be part of a rapidly growing Fintech company that has been ranked within the top 100 influential Companies to work alongside financial institutions. Your new role You will be a crucial senior engineer working on mission critical functions within the operations team. You must have an immense passion for technology and automation. A resilient approach and a problem-solving mind when dealing with complex problems are all essential. You will be working alongside seasoned professionals in a high-intensity team environment. Your day to day will be diverse and consists of: Designing and innovating and developing a wide range of key systems including improving the CI/CD Process. Monitoring and maintaining a 100% cloud environment (AWS) Create, secure, reliable, repeatable production rollouts. Communicating ideas and decisions throughout the team Ensuring that tasks are owned and fully completed with high quality Working with multiple business units to gather requirements and collaboratively build solutions Ongoing maintenance and upgrades on key components of the platform What you'll need to succeed Excellent knowledge and practical skills regarding AWS (Monitoring and maintaining) Excellent Linux/ Unix Administration skills- requires networking and scripting Experience using one or more of the following Ansible Puppet Chef Salt Excellent teamwork and strong Communication skills. Understanding AWS, Google Cloud, Microsoft Azure or one of the other IaaS providers (Linode, Digital Ocean, OpenStack, VMWare, XEN) and using Terraform, CloudFormation, boto or other orchestration tools. Intrinsic interest and experience with development and scripting languages as Python, JavaScript, Java, C++, Bash Experience with software such as Git What you'll get in return You will be working an emerging fintech company whose state-of-the-art offices are in the heart of London. You will also be entitled to flexible working including a 100% remote if you would like it. An extremely competitive salary as well as other benefits such as subsides healthcare, dental, pension, 25 + days annual leave and a very competitive salary. What you need to do now Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C's, Privacy Policy and Disclaimers which can be found at hays.co.uk

Senior Application Support / SRE

Deerfoot IT Resources Ltd

Senior Application Support / SRE Hybrid Working: Mix of Home Working / London EMEA HQ Permanent, Full Time As a trusted and preferred recruitment partner to this leading global provider of cloud-based solutions to the global financial sector, we have been asked to assist in the hire of a Senior Application Support Engineer to take responsibility for the availability and reliability of services used by over 23,000 customers across 90 countries (including 22 of the world's top 25 banks). In this role you will ensure all services exceed availability targets, have in-depth monitoring and are proactively managed. Already benefitting from a dominance in the North American finance industry, our client is expanding its London operations to better serve the UK and EU markets. This is an exciting time to join, and you will have the opportunity to work a mix of remotely and within their state-of-the-art EMEA HQ in London. Your Job *Service Reliability: Proactively identifying risks to service and remediate them. Reduce risk from deployments by improved use of resilience and ensuring appropriate testing of releases pre and post deployment. Provide support and troubleshooting when service incidents occur. Improve time to recover from service impacting incidents. Identifying trends and root causes to reduce volume of incidents. *Automation: Identify and deliver on opportunities to use automation to increase efficiency, reduce toil and drive service availability. Use automation and orchestration techniques to provide repeatable solutions and reduce risk of mis-operations. *Observability: Monitor and ensure smooth operation of all production services. Identifying gaps in coverage and improving observability of Production services. Ensuring appropriate events are generated for service failure or degradation scenarios. Responding to events and alerts in timely manner managing through to resolution. *Knowledge management: Continuously improving the knowledge of the Application Support team to become subject matter experts on the Product and the technology that runs it. Collaborating with other teams to understand how underpinning services support the Products. Identifying opportunities to share knowledge and decrease the time it takes to resolve customer related incidents. Tech Stacks: Platform and Database Tech: Linux, Cassandra, Kafka, ArangoDB; Containerisation/Virtualisation: Kubernetes/OpenShift, VMware; Instrumentation and Monitoring: Splunk, Zabbix, Prometheus, Grafana; Scripting: PowerShell, Python. Your Skills *Experience as a Site Reliability Engineer, Application Support Engineer or similar running highly available critical services (ideally SaaS) *Scripting abilities in PowerShell / Python *Understanding of networking, firewalls, protocols, databases and more *Java Debugging - ability to complete thread dumps and analysis *Experience with monitoring solutions *Splunk Experience - creating dashboards, events and analysis *CI/CD Delivery Practices *Troubleshooting connectivity issues: TCP/IP, DNS, Telnet, Trace Route, TCP dump and analysis *Awareness of Load Balancing Technologies such as HA Proxy, Nginx, F5 *Experience of collaboration technologies - email, archiving, instant messaging *Exposure to support Voice / SMS Tech nice to have Alongside a competitive salary, you will receive a benefits package which includes 25 Days Holiday (increases with service), Private Medical Cover, Bupa Dental Cover, Life Insurance, Income Protection, Secondment Opportunities to Global HQ in Vancouver, Pension Scheme (increases with service up to 7% employer contribution), Bonus Scheme (up to 8% dependent on revenues and team performance). This role would be suitable for those who have held the following job roles: Site Reliability Engineer, Senior SRE, Site Availability Engineer, Application Support Engineer, Senior Site Reliability Engineer, Senior Application Support Engineer, Lead SRE, Lead Site Reliability Engineer, Lead Application Support. Deerfoot IT Resources Ltd is one of the UK's leading IT Recruitment Agencies, trusted by many of the UK's leading employers. Established in 1997, we have over twenty years of experience as IT Recruitment Specialist. We will never send your CV anywhere without your authorisation and only after you have seen the complete details on this opportunity. Deerfoot is acting as an employment agency in relation to this vacancy. Each time Deerfoot sends a CV to a recruiting client we donate £1 to The Born Free Foundation ().

Nov 04, 2021

Full time

Senior Application Support / SRE Hybrid Working: Mix of Home Working / London EMEA HQ Permanent, Full Time As a trusted and preferred recruitment partner to this leading global provider of cloud-based solutions to the global financial sector, we have been asked to assist in the hire of a Senior Application Support Engineer to take responsibility for the availability and reliability of services used by over 23,000 customers across 90 countries (including 22 of the world's top 25 banks). In this role you will ensure all services exceed availability targets, have in-depth monitoring and are proactively managed. Already benefitting from a dominance in the North American finance industry, our client is expanding its London operations to better serve the UK and EU markets. This is an exciting time to join, and you will have the opportunity to work a mix of remotely and within their state-of-the-art EMEA HQ in London. Your Job *Service Reliability: Proactively identifying risks to service and remediate them. Reduce risk from deployments by improved use of resilience and ensuring appropriate testing of releases pre and post deployment. Provide support and troubleshooting when service incidents occur. Improve time to recover from service impacting incidents. Identifying trends and root causes to reduce volume of incidents. *Automation: Identify and deliver on opportunities to use automation to increase efficiency, reduce toil and drive service availability. Use automation and orchestration techniques to provide repeatable solutions and reduce risk of mis-operations. *Observability: Monitor and ensure smooth operation of all production services. Identifying gaps in coverage and improving observability of Production services. Ensuring appropriate events are generated for service failure or degradation scenarios. Responding to events and alerts in timely manner managing through to resolution. *Knowledge management: Continuously improving the knowledge of the Application Support team to become subject matter experts on the Product and the technology that runs it. Collaborating with other teams to understand how underpinning services support the Products. Identifying opportunities to share knowledge and decrease the time it takes to resolve customer related incidents. Tech Stacks: Platform and Database Tech: Linux, Cassandra, Kafka, ArangoDB; Containerisation/Virtualisation: Kubernetes/OpenShift, VMware; Instrumentation and Monitoring: Splunk, Zabbix, Prometheus, Grafana; Scripting: PowerShell, Python. Your Skills *Experience as a Site Reliability Engineer, Application Support Engineer or similar running highly available critical services (ideally SaaS) *Scripting abilities in PowerShell / Python *Understanding of networking, firewalls, protocols, databases and more *Java Debugging - ability to complete thread dumps and analysis *Experience with monitoring solutions *Splunk Experience - creating dashboards, events and analysis *CI/CD Delivery Practices *Troubleshooting connectivity issues: TCP/IP, DNS, Telnet, Trace Route, TCP dump and analysis *Awareness of Load Balancing Technologies such as HA Proxy, Nginx, F5 *Experience of collaboration technologies - email, archiving, instant messaging *Exposure to support Voice / SMS Tech nice to have Alongside a competitive salary, you will receive a benefits package which includes 25 Days Holiday (increases with service), Private Medical Cover, Bupa Dental Cover, Life Insurance, Income Protection, Secondment Opportunities to Global HQ in Vancouver, Pension Scheme (increases with service up to 7% employer contribution), Bonus Scheme (up to 8% dependent on revenues and team performance). This role would be suitable for those who have held the following job roles: Site Reliability Engineer, Senior SRE, Site Availability Engineer, Application Support Engineer, Senior Site Reliability Engineer, Senior Application Support Engineer, Lead SRE, Lead Site Reliability Engineer, Lead Application Support. Deerfoot IT Resources Ltd is one of the UK's leading IT Recruitment Agencies, trusted by many of the UK's leading employers. Established in 1997, we have over twenty years of experience as IT Recruitment Specialist. We will never send your CV anywhere without your authorisation and only after you have seen the complete details on this opportunity. Deerfoot is acting as an employment agency in relation to this vacancy. Each time Deerfoot sends a CV to a recruiting client we donate £1 to The Born Free Foundation ().

Site Reliability Engineer - SRE Golang Node

Open Source Team

Site Reliability Engineer London / Remote to £85k Site Reliability Engineer / SRE (Golang Node Linux AWS) *Remote UK* Would you like to make an impact in an organisation that is currently building the next generation of software infrastructure to shape the future of the internet? If you'd like to work on code that pushes the boundaries of software development, in a company defined by innovation, thi...... click apply for full job details

Mar 31, 2021

Full time

Site Reliability Engineer London / Remote to £85k Site Reliability Engineer / SRE (Golang Node Linux AWS) *Remote UK* Would you like to make an impact in an organisation that is currently building the next generation of software infrastructure to shape the future of the internet? If you'd like to work on code that pushes the boundaries of software development, in a company defined by innovation, thi...... click apply for full job details

Site Reliability Engineer - Go Node

Client Server

Site Reliability Engineer London / Remote to £85k Site Reliability Engineer / SRE (Golang Node Linux AWS) *Remote UK* Are you a naturally analytical technologist who involves resolving complex distributed system issues? Would you like an opportunity to work on code that pushes the boundaries of software development, in a company defined by innovation? You could join this trailblazing, hyper-growth st...... click apply for full job details

Mar 23, 2021

Full time

Site Reliability Engineer London / Remote to £85k Site Reliability Engineer / SRE (Golang Node Linux AWS) *Remote UK* Are you a naturally analytical technologist who involves resolving complex distributed system issues? Would you like an opportunity to work on code that pushes the boundaries of software development, in a company defined by innovation? You could join this trailblazing, hyper-growth st...... click apply for full job details

Site Reliability Engineer - Go Node

Open Source Team

Site Reliability Engineer London / Remote to £85k Site Reliability Engineer / SRE (Golang Node Linux AWS) *Remote UK* Are you a naturally analytical technologist who involves resolving complex distributed system issues? Would you like an opportunity to work on code that pushes the boundaries of software development, in a company defined by innovation? You could join this trailblazing, hyper-growth sta...... click apply for full job details

Mar 19, 2021

Full time

Site Reliability Engineer London / Remote to £85k Site Reliability Engineer / SRE (Golang Node Linux AWS) *Remote UK* Are you a naturally analytical technologist who involves resolving complex distributed system issues? Would you like an opportunity to work on code that pushes the boundaries of software development, in a company defined by innovation? You could join this trailblazing, hyper-growth sta...... click apply for full job details

Site Reliability Engineer - SRE Golang Node

Open Source Team

Site Reliability Engineer London / Remote to £85k Site Reliability Engineer / SRE (Golang Node Linux AWS) *Remote UK* Would you like to make an impact in an organisation that is currently building the next generation of software infrastructure to shape the future of the internet? If you'd like to work on code that pushes the boundaries of software development, in a company defined by innovation, thi...... click apply for full job details

Mar 17, 2021

Full time

Site Reliability Engineer London / Remote to £85k Site Reliability Engineer / SRE (Golang Node Linux AWS) *Remote UK* Would you like to make an impact in an organisation that is currently building the next generation of software infrastructure to shape the future of the internet? If you'd like to work on code that pushes the boundaries of software development, in a company defined by innovation, thi...... click apply for full job details

DevOps Engineer, SRE / Site/Systems Reliability Engineer, C++

TEKsystems London, , United Kingdom

[DevOps Engineer, SRE, Site Reliability Engineer, Systems Reliability Engineer, C++, Automation, Chef, Puppet, Splunk, Jenkins, SonarQube, Coverity, Market Data, Software Engineer, Production Services, Unix, Linux, Scripting] Role One of the most famous and recognisable names in global finance..... click apply for full job details

Feb 21, 2016

[DevOps Engineer, SRE, Site Reliability Engineer, Systems Reliability Engineer, C++, Automation, Chef, Puppet, Splunk, Jenkins, SonarQube, Coverity, Market Data, Software Engineer, Production Services, Unix, Linux, Scripting] Role One of the most famous and recognisable names in global finance..... click apply for full job details

17 jobs found

Modal Window