Site Reliability Engineer

March 26, 2026
🪙 2515000 - 🪙 3655000 / year
Application ends: November 29, 2028
Apply Now

Job Description

A Site Reliability Engineer applies software engineering principles to infrastructure and operations, ensuring system reliability, scalability, and performance. SREs bridge development and operations, automating workflows, managing incidents, and maintaining uptime across production environments at scale.

Key Responsibilities

  • Monitor and maintain reliability of critical production systems.
  • Automate infrastructure tasks to eliminate operational toil.
  • Lead incident response and conduct post-incident reviews.
  • Define and track SLIs, SLOs, and error budgets.
  • Build and maintain CI/CD pipelines and deployment strategies.
  • Implement observability using metrics, logs, and traces.
  • Collaborate with developers to embed reliability in design.
  • Conduct chaos engineering experiments to identify system weaknesses.

Skill & Experience

  • Proficiency in Python, Go, or Bash scripting languages.
  • Expertise in AWS, GCP, or Azure cloud platforms.
  • Strong knowledge of Kubernetes, Docker, and containerization.
  • Experience with Prometheus, Grafana, and Datadog monitoring.
  • Infrastructure as Code skills using Terraform or CloudFormation.
  • Strong problem-solving, communication, and cross-team collaboration.

Note: Salary depends on experience and skills and is paid in local currency.

Photos

Related Jobs