Job Type
Full-time
Description
Job Summary
The Cloud Ops Manager is a working supervisor responsible for the reliable delivery of CaseWorthy’s software-as-a-service offerings, who ensures world class availability, security, and performance of these offerings. The incumbent manages a team of Cloud Engineers and Administrators who design, build, operate, and maintain the cloud infrastructure and who support innovations of the larger engineering organization.
Responsibilities
- Manages team of Cloud Engineers and Administrators.
- Ensures that there is 24x7 on-call and escalation coverage for support of cloud environments.
- Owns monitoring and observability solutions.
- Inculcates a “devops” culture across CaseWorthy R&D teams.
- Works in tandem with the engineering teams to identify and implement the most optimal cloud-based solutions for the company.
- Provides guidance, thought leadership, and mentorship both to the Cloud Ops team and to other development teams to build cloud competencies.
- Educates teams on the implementation of new cloud-based initiatives, providing associated training as required.
- Monitors and ensures the performance, uptime, and scale of systems and proactively addresses issues.
- Troubleshoots incidents, identifies root cause, fixes, and documents problems, and implements preventive measures.
- Operates and manages cloud environments in accordance with company security guidelines.
- Consults in system design to meet security, cost, reliability, and capacity requirements.
- Manages cloud costs and provides cost projections as needed.
- Manages data backup operations and maintains data disaster recovery plans.
- Manages deployments to UAT and production environments.
- Automates infrastructure and configuration management using CI/CD and infrastructure as code.
- Ability to travel nationwide, up to 10% annually.
- Performs other duties as assigned.
Requirements
Required Skills & Qualifications
- Deep understanding of delivering software-as-a-service solutions
- 5+ years of experience in a systems administration, software engineering, or site reliability related role
- 3+ years of experience involving management and design of infrastructure within AWS or Azure
- 2+ years of supervising employees in an engineering organization
- Understanding of and experience with the five pillars of a well-architected framework
- Knowledge of a variety of security domains such as: problem management, security vulnerability assessments, business continuity, security audits and standards, and identity management
- Experience in hosting web applications, container orchestration, serverless compute, and ETL jobs within AWS or Azure.
- Working knowledge of infrastructure-as-code (ARM/BICEP, Terraform, CloudFormation, and/or CDK) and pipeline automation (GitHub Actions, AWS CodeBuild / Code Deploy, Jenkins, Azure Pipelines, BitBucket Pipelines, or similar)
- Knowledge of NIST 800-53/HITRUST/ISO Regulatory Frameworks
- Strong written and verbal communication skills for both technical and non-technical audiences
- Proven ability and passion to pick up new technologies and stay on the cutting edge of technology
Preferred Skills & Qualifications
- AWS and/or Azure certifications a plus
Salary Description
$125,000-$150,000