Sub banner

SRE (Fully Remote)

Job description.

Site Reliability Engineer

We're working with a global technology consultancy that designs, builds, and supports modern software platforms for enterprise customers worldwide. They partner closely with clients to deliver reliable, scalable, cloud-native solutions.

The Role

As an SRE, you'll play a key role in ensuring the availability, performance, and scalability of production systems, supporting customers across the EMEA region. Helping to build, mature, and enhance the SRE function. This is a hands-on, technical role, focused on reliability, automation, and operational excellence across a distributed, cloud-based platform

Key Responsibilities

  • Platform Reliability: Deploy, operate, and improve Kubernetes clusters across multiple cloud environments.
  • Service Performance: Design and implement processes to enhance system reliability, availability, and scalability.
  • CI/CD Enablement: Build and optimise CI/CD pipelines to support safe, repeatable deployments.
  • Observability & Incidents: Own monitoring, alerting, and incident response to minimise downtime and speed recovery.
  • Root Cause Analysis: Lead post-incident reviews and implement long-term preventative improvements.
  • Automation: Reduce operational toil through automation and performance optimisation.
  • On-Call: Participate in weekday coverage and a once-monthly weekend rota.

Collaboration & Stakeholder Engagement

  • Work closely with engineering, infrastructure, and product teams to embed SRE best practices.
  • Advocate for reliability, resilience, and operational excellence across teams.
  • Collaborate with a globally distributed engineering function.
  • Engage directly with customers to resolve incidents and improve user experience.

Skills & Experience

  • Proven experience as an SRE or similar role, supporting complex distributed systems (5+ years).
  • Strong Kubernetes experience (AKS, EKS, GKE, or similar).
  • Hands-on with observability tools such as Prometheus, Grafana, Kibana, Vector, or Superset.
  • Experience with at least one major cloud platform: AWS, Azure, GCP, or Linode.
  • SQL database experience (PostgreSQL beneficial but not essential).
  • Proficiency in Python, Go, or Rust.
  • Strong Linux expertise, including performance tuning and troubleshooting.
  • Excellent communication skills, able to work effectively with engineers and customers.
  • customers and cross-functional team

Please apply now if you are meeting the above criteria, or contact Andrew Harrison directly.

Submit CV for this Job.

Apply for this job now
Posted
Job Details:
Belfast, County Antrim£60000 - £70000 per annum
Job reference:
AH 90
ANDREW HARRISON

ANDREW HARRISON

Principal Technology Consultant at Ocho

Andrew brings a wealth of experience in IT Infrastructure, alongside a comprehensive understanding of Cyber, Cloud, Support, and Networking solutions. He’s not just leading the IT Infrastructure desk. He’s set to make impactful contributions across various tech disciplines at Ocho.

Read More