Location
Indonesia
Job Type
Full-time
Posted
July 01, 2026
Job Description
We're looking for a Site Reliability Engineer (SRE) to help build and maintain a highly reliable, scalable, and secure production environment. You'll work closely with engineering teams to improve system availability, automate operations, and respond to production incidents.
What You'll Do
- Monitor and maintain production systems to ensure high availability and performance.
- Respond to incidents, troubleshoot production issues, and drive root cause analysis.
- Improve system reliability through automation, monitoring, and observability.
- Design and implement deployment, rollback, and disaster recovery strategies.
- Build and maintain monitoring, alerting, and health check solutions.
- Collaborate with Software Engineers, Platform Engineers, AI Engineers, and Security teams to improve platform reliability.
- Develop operational runbooks and continuously improve production processes.