R&D Manager

Date: Apr 12, 2024

Location:

India, Hyderabad

Job Category: R&D

Department: Product & Technology

Manager – SRE – L1

Hyderabad, India

About CyberArk:

CyberArk, the global leader in privileged access management, helps organizations transform their business through improved security and reduced risk. As a trusted partner for thousands of companies around the world, CyberArk consistently sets the bar – driving innovation and helping our customers stay one step ahead of attackers.

Job Overview:

We are the Infrastructure Team of CyberArk. Our group provides back-end services to all CyberArk applications. Our group's mission is to protect sensitive assets using an access control layer (authentication & authorization), cryptography, and secure communication while practicing the highest security standards.

Our goal is to build and maintain a scalable, fault-tolerant, high-load, distributed system. We are searching for an outstanding SRE manager who is responsible for leading and managing a team of Site Reliability Engineers, with a focus on ensuring the reliability, performance, and scalability of CyberArk’s saas services and AWS infrastructure. This role involves a combination of technical expertise, leadership, and collaboration to meet the organization's reliability and availability goals.

Responsibilities:

Team Leadership: Lead, mentor, and develop a team of Site Reliability Engineers, providing guidance and support in their daily work. Set clear performance goals and expectations for the team, and conduct regular performance reviews.
Incident Management : Develop and maintain an incident response plan to minimize downtime and service disruptions. Lead the team in responding to and resolving critical incidents, ensuring post-incident reviews to prevent future occurrences. Participate in an on-call rotation schedule to provide 24/7 support for critical incidents and escalations.
Monitoring And Alerting : Implement and maintain robust monitoring and alerting systems to proactively identify and address performance and reliability issues. Continuously refine alerting thresholds to minimize false alarms and maintain service reliability.

Cost Optimization : Execute regular cost optimization audits. Work closely with infrastructure and development teams to forecast and plan for future capacity needs to ensure that the infrastructure can handle traffic growth.
SOPs : Maintain comprehensive documentation for SRE processes and procedures. Provide training and knowledge sharing sessions to improve the overall technical expertise of the team.
Strategy And Planning : Define and execute a strategic vision for site reliability engineering, including setting objectives, goals, and key performance indicators. Collaborate with cross-functional teams to align SRE efforts with business objectives and priorities.

Qualifications

Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.
10+ years in a Site Reliability Engineering with strong knowledge in AWS services. AWS Certification is a plus.
5+ years experience in leading a team of SREs and collaboration with various stakeholders.
Knowledge of defining and monitoring system quality measures, including SLO and SLA;
Built tooling to improve reliability of systems, automated remediation of issues, or improve scalability;
Hands-on experience collecting performance data, analyzing, troubleshooting, and tuning. Exposure to APM tools is a plus
Proficiency in system administration, cloud infrastructure, and release deployment.
Excellent problem-solving skills and the ability to work well under pressure.

Apply for Job

R&D Manager

Manager – SRE – L1

Hyderabad, India

Apply for Job