Site Reliability Engineer (Docker/Kubernetes)
209 S. LaSalle 8th Floor Chicago, IL 60601
The Senior Docker and Kubernetes Engineer will join the Site Reliability Engineering team and work at the highest systems level to provide expert advice and counsel to users, management and IT project teams for systems of the most complex nature (typically crossing function/location lines). This individual is expected to already possess a deep understanding of docker container technology, and will be responsible for defining core docker/container standards and publishing best practice guides. This individual will continue to evaluate new technologies and update/develop departmental standard practices as necessary. Provides leadership to staff members and teams in a specific area of expertise and in problem analysis techniques. Identifies or reacts to problems related to the technology infrastructure or in support of the software development process. Demonstrates a broad knowledge level across multiple areas of information technology.
- Creates a vision that connects engineering strategies to the big picture and sets action plans accordingly
- Partners with application development and infrastructure teams to align leading edge and leveraging strategies.
- Work with Security team to ensure system configurations are following security policies and controls standards.
- Create technology efficiency and new capabilities through the use of automation
- Ability to deal with ambiguity and drive clarity and actions
- Ability to summarize complex technical and business issues to the appropriate audience
- Ability to work with a variety of development environments and platforms
- Provide end to end ownership of technology products that includes complete lifecycle
- Propose and implement medium to large scale system deployment (hardware, OS and cloud platforms)
- Support new and existing deployments through the analysis of metrics, logs, system alarms etc
- Create new tools and automation of existing ones to improve efficiency of product maintenance
- Analyzes, acquires, installs, modifies and supports operating systems.
- Conducts highly complex systems automation/scripting and systems support activities, including system integration and monitoring.
- Create design documentation for new products and procedures.
- Maintain existing documentation of systems processes and procedures.
- Facilitate outage restoration calls to access application impact, notify the business, and restore system issues.
- Responsible to document issue summary, provide root cause analysis, and implement remediation actions.
- Work closely with L3 and L2 support groups to develop, plan, and implement system enhancements and upgrades.
- Be available for L3 escalation requests, during normal business hours as well as off-hour rotation.
- Minimum 5+ years of experience working with enterprise Linux environments scaling beyond 100+ systems.
- Minimum 3+ years of experience working with enterprise container platforms such as Docker and Kubernetes.
- Extensive knowledge with enterprise Server Operating Systems such as RHEL 7, CentOS 7, Red hat or Suse.
- Experience with container orchestration platforms such as Kubernetes, or OpenShift.
- Experience with performance tuning and troubleshooting server OS issues (CPU, Memory and I/O).
- Experience and understanding of OS lifecycle management and secure system configurations.
- Experience in automation of code deployment through the use of containers.
- Experience with build tools, CI/CD, Devops and agile principles.
- Experience with basic IaaS and PaaS functionality.
- Experience developing solutions with configuration management tools such as Puppet, Ansible or Chef.
- Engineering experience in building production infrastructure using code and repeatable designs.
- Ability to automate common and repeatable tasks/processes.
- Experience interacting with APIs to automate processes.
- Require strong automation and scripting skills. Python and Linux shell.
- Balance of strategic and tactical skills and the ability to work on cross-functional teams.
- Strong understanding of project management methodologies and processes (Lean, Agile).
- Excellent time management and organizational skills.
- Ability to create “ Run Books”, defining day-to-day support, maintenance, troubleshooting of the infrastructure.