Experience: Minimum 4 years
Roles and responsibilities
- Deploy and maintain applications.
- Design, build, manage and operate the infrastructure and configuration of SaaS applications with a focus on automation and infrastructure as code.
- Evaluate performance trends and expected changes in demand and capacity, and establish the appropriate scalability plans
- Ensure that SLAs are met in executing operational tasks
- Be on-call to respond to infrastructure failures.
- Debug infrastructure related production issues across services and multiple levels of stack.
- Setup automation to prevent similar incidents from happening.
- Configure "smart" monitoring to get early warning before failure points.
- Maintain a change log for every action and help build a knowledge base of failures and solutions.
- Regular reporting of performance benchmarks for production systems. Plan to Scale up/down when needed
- Responding to infrastructure alerts
- Configure monitoring/alerts where needed.
- Fine tuning of monitoring thresholds and reducing false alerts
- Maintaining an audit log of changes
- To support a customer/internet facing application that needs to be up 24x7.
- Monitoring Database clusters.
What candidate should know:
- A strong knowledge of AWS Technologies and a willingness to self-teach with change.
- Systematic problem-solving approach, combined with a strong sense of ownership and drive.
- Experience in Design, creation, and provisioning of infrastructure.
- Experience working within an Agile/Scrum SDLC
- Experience with Continuous Delivery and Deployment Automation (Our env: Ansible, Gitlab, Git/Github, Artifactory, Terraform)
- Solid experience using configuration management frameworks (e.g. Chef, Puppet)
- Experience in Building and managing Virtualized systems (Containers/Docker)
- Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like Kubernetes, Grafana, ELK, Datadog, New Relic and other similar tools.
- Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)
- An understanding of capacity planning and how to set appropriate limits to optimize cost and performance.
- Knowledge of identifying system scale, backoff or other throughput challenges to help prevent incidents or resolve them quickly.
- Identify and troubleshoot any availability and performance issues at multiple layers of deployment, from hardware, operating environment, network, and application.
- Experience with performing to metric, SLI/SLO/SLA(s)
- History with product behavior, edge cases, failure modes, negative boundary behaviors, load mishaps, etc.., to stop issues before they enter production.
- Firm grasp of at least one modern programming language, beyond advanced scripting (Shell, Perl, Python)
- Experience writing automation tools eagerness to automate all the things
- An understanding of capacity planning and how to set appropriate limits to optimize resources.
- Working knowledge of information security issues.
- Certified AWS Solution Architect Associate / Professional preferred
Submit Your Application
You have successfully applied
- You have errors in applying