Infrastructure and Devops Engineer

  • Site Reliability Engineer (SRE), DevOps
  • Permanent
  • London, UK

Overview

In this role you'll join our team of engineers responsible for the operation, administration, and security of the various TXODDS technology platforms.

You will also be working on the build of new capabilities and the continual deployment of software releases and application upgrades across our existing estate.

The responsibilties will include:

Ensuring the 24/7 availability of systems, infrastructure and our real-time, low latency data services.

Designing, building and allocating compute resources as required by application, customer and business demand.

Management of our internal development toolsets and CI/CD workflows.

Enhancing and implementing redundancy and recovery proceedures for our products and services.

Optimising the automated deployment and packaging of software packages and release processes.

Monitoring and implementing backups and ensuring data resiliency and integrity.

Troubleshooting, fixing problems and owning issues through to resolution.

Required Skills:

Expert knowledge of managing Linux (CentOS) servers at scale – both virtual and bare metal.

Good networking skills – experience of managing switches, routers and firewalls.

Experience of systems hardening measures and the implementation of security standards such as e.g. ISO27001.

Experience of configuration management systems such as Puppet and Ansible.

Expert knowledge of virtualisation and container concepts.

Skills in scripting with languages such as Python / Ruby / Perl.

Experience working with Windows based servers.

Experience of Java / Scala based applications and systems.

Experience working with Kubernetes ( In Production! )

Confident working completely independently with a globally distributed remote team.

Note: as part of this role you will be expected to participate in an "on-call" rota potentially working on incidents out of hours and on weekends.