Site Reliability Engineer
We are looking for talented and passionate engineers to help us build our new Site Reliability Engineering team. This team will be tasked with scaling, optimizing, and securing our infrastructure so that we can continue to grow our service while providing our customers with the performance they expect. The successful candidate is a DevOps focused engineer who believes in automation and understands how to build reliable systems at scale while balancing costs, security, and developer needs. You will have a direct impact on our clients ability to deliver a great user experience to our customers quickly and efficiently.
Multiple client locations
Responsibilities
- Ownership, architecture and management of AWS infrastructure components such as VPCs, EC2, S3, tagging schemes, CloudFormation, etc.
- Deployment and management automation of cloud-based infrastructure and software
- Working with configuration management tools in both Windows and Linux* Cloudformation, Terraform, Salt, Ansible, Chef
- Ensuring cloud-based architectures meet availability and recoverability requirements
- Architecture and implementation of cloud-based monitoring, alerting and reporting; Sensu, CloudWatch, StatusCake, ELK, Grafana
Requirements
- Bachelors in Computer Science or equivalent experience
- Minimum 2 years of experience managing AWS infrastructure
- Minimum of 5 years of experience with technical operations and software development
- Solid understanding/experience of containerization services such as Docker
- Working knowledge of open source tools such as Sensu, InfluxDB, Grafana, Logstash, Elasticsearch
- Solid understanding/experience of web services, databases and relating infrastructure/architectures
- Solid understanding of backup/restore best practices
- Ability to manage using a preferred scripting language
- Solid understanding of SAN principles
- Excellent Troubleshooting Skills
- Experience supporting an enterprise-level SaaS environment
- Security Experience a plus