Ensure high availability, reliability, and performance of applications and infrastructure
Design and maintain monitoring, alerting, and incident response systems
Automate deployment, scaling, and operational tasks
Manage production incidents, root cause analysis (RCA), and post-incident reviews
Define and track SLIs, SLOs, and SLAs
Improve system scalability, fault tolerance, and disaster recovery
Collaborate with developers to improve system design and operability
Manage cloud infrastructure and containerized environments
Implement security and compliance best practices in production systems
Competitive salary with annual increments
Performance-based bonuses
On-call / shift allowances
Stock options / ESOPs (in product companies and startups)
Comprehensive health insurance (self + family)
Life and accidental insurance
Provident Fund (PF) / retirement benefits
Paid sick leave and medical reimbursements
Flexible working hours
Remote or hybrid work options
Paid leaves, holidays, and compensatory offs
Well-defined on-call rotations to avoid burnout
Exposure to large-scale, high-availability systems
Sponsored certifications (AWS, GCP, Kubernetes, SRE)
Training on cloud, DevOps, and automation tools
Participation in tech conferences and workshops