Sr Site Reliability Engineer
Fully Remote Technology and Operations
Job Type
Full-time
Description

 

Job Summary 

The Site Reliability Engineer, Senior (SRES) will work collaboratively with software and systems engineering to deploy and manage our systems within our Amazon Web Services (AWS) Cloud. The SRES will lead the automation and streamlining operations and processes. In addition, the SRES will design, build, setup, and maintain tools for deployment, monitoring, and infrastructure provisioning on the AWS Cloud. This role will be responsible for the vision and design of the whole stack from load balancers to the databases, and then move and launch sites on every application release following a “Blue/Green” methodology. 


Summary of Essential Job Functions 

  • Owns the application and all aspects of it in production, including the user experience. Administers all systems related to R&D projects, including user creation, systems provision troubleshooting, monitoring, etc. 
  • Creates the vision and designs the automation strategy across the platform. This includes researching gaps in automation and laying out the vision and the plan to remove the gaps. 
  • Designs and Orchestrates system provisioning, autoscaling and Continuous Integration (CI) 
  • Responsible for Release Management with a expert understanding of Continuous Integration and Continuous Delivery 
  • Troubleshoots site down issues. Responds to emergency outages and coordinate responses with engineering teams in multiple locations. 
  • Works with engineering teams to refine deployment and release processes. 
  • Works closely with developers in supporting new features, services, releases, and become an expert in our services. 
  • Monitors site reliability and performance. 
  • Scales infrastructure to meet demand. Continuously monitors/improves the quality of our infrastructure. 
  • Designs and develops automation tools to deploy code. 
  • Ensure that all system design and procedures are documented and up-to-date. 
  • Participates in on-call rotation as needed. 
  • Maintains professional and technical knowledge by attending educational workshops; reviewing professional publications; establishing personal networks; benchmarking state-of-the-art practices; participating in professional societies; leading staff development educational series. 
  • Recommends and implements strategies, policies, and procedures by evaluating organization outcomes; identifying problems; evaluating trends; and anticipating requirements. 
  • Performs other duties not otherwise listed as required by the company. 
Requirements
  • Bachelor’s Degree in Computer Science or a related field with 7+ years in cloud infrastructure (AWS preferred) 
  • Expert administration knowledge with Linux, UNIX, SSH, cron, and access control 
  • Extensive experience with managed hosting and colocation 
  • Extensive experience supporting high traffic, high volume web applications and websites 
  • Willingness to work flexible / odd hours at times, based on needs, including on-call rotation 
  • Ability to use a wide variety of open source technologies and cloud services (AWS) 
  • Expert level knowledge with experience with AWS infrastructure - VPC, Security Groups, EC2, ELB, S3, RDS, etc. 
  • Expert level knowledge and experience with AWS API integration 
  • Experience with distributed tracing (OpenTracing, zipkin, etc) 
  • Experience with Kubernetes and related technologies (coreOS, etcd, envoy) 
  • Familiarity with notification platforms like PagerDuty or OpsGenie 
  • Analytical thinking and troubleshooting to resolve infrastructure and/or application issues 
  • Excellent verbal and written communication skills 
  • Must have a passion to learn new technologies 
  • Expert scripting ability - Bash, Ruby, Python 
  • Solid working knowledge and experience with Go 
  • Expert understanding of DNS, TCPDUMP, CDNS, SSL, Git, Firewalls and network concepts 
  • Experience with automation/configuration management (terraform and/or CloudFormation) 
  • Knowledge of best practices and IT operations in an always-up, always-available service environment 
  • Understanding of monitoring tools and statistics – Newrelic, Sumologic, DataDog, StackDriver or CloudWatch 
  • Solid understanding of Docker and containers 
  • Ability to work in a team environment and reliable to work independently 
  • Ability to manage multiple tasks in a dynamic, fast-paced environment 

The annual salary for this position typically starts between $85,000 - $120,000. Placement within the range is determined by a variety of factors, including but not limited to: knowledge, skills, years & depth of experience, location, and equity with internal team members. 


For remote positions, employees must reside in one of the following locations: AL, AR, AZ, CA, CO, CT, DC, FL, GA, IA, ID, IL, IN, KS, KY, LA, MA, MD, MI, MN, MO, MS, MT, NC, NJ, NH, NV, NY, OH, OK,OR, PA, SC, SD, TN, TX, UT, VA, WA, WI, WV. All other states are not in consideration for this role at this time.  


Vanco is an Equal Opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.