Financial Services/Banking/Insurance

Site Reliability Engineer (Onsite)


Strategic Staffing Solutions


S3 is seeking a Site Reliability Engineer for one of our partners in the Finance industry. Candidates MUST BE local to the Riverwoods/Chicago, IL area.

Job Title: Site Reliability Engineer (Onsite)

Location: Riverwoods, IL 

Role Type: W2 Only, No C2C

Contract Length: 6 months, contract to hire

Pay Rate: $60.00-80/hr.

How to Apply: Please send resume and contact information to Keena Leo, Sourcing Specialist, at and reference job #220503.


  • Develop and run SRE own tooling and observability using automation like CI/CD, and Kubernetes. 
  • Build monitoring that alerts on symptoms rather than on outages. 
  • Document every action so your findings turn into repeatable actions and then into automation. 
  • Debug production issues across services and levels of the stack. 
  • Plan the growth and reliability of services. 
  • Be on an on-call rotation to respond to “Code Red” incidents to help restore customer impacting service. 
  • Automation like CI/CD, self-healing of services, end-to-end or performance testing 
  • Improve monitoring (data Dog, AppD etc.) and building new smart metrics 
  • Develop a relationship with a product group and help define their SLO/SLI 
  • Work directly with AppDev to improve product by Non-functional and production readiness 
  • Improve operability, latency, capacity planning, change management and improve MTTR (Mean Time to Repair) 
  • Leading and contributing to scope and designs for issues, epics, and OKRs (Objective and Key Result) 
  • Contributing to the Handbook, create and update runbooks, general documentation, and write blogs 
  • Completing Root Cause Analysis (RCA) investigations and performing readiness reviews 
  • Improving team practices through code reviews, handoffs of work and incidents 
  • Provides emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed 
  • Proposes ideas and solutions to debug, optimize code, and to automate tasks. 
  • Plan, design and execute solutions within to reach specific goals agreed within the team. 
  • Plan and execute configuration change operations both at the application and the infrastructure levels. 
  • Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation 
  • Experience designing, analyzing, and debugging distributed systems 

You may be a fit for this role if you have some of these inclinations: 

  • Have an urge for delivering quickly and effectively and iterating fast.  
  • Think about systems: edge cases, failure modes, behaviors, specific implementations. 
  • As an engineer, when you see something broken, you cannot help but fix it.  
  • Have an urge to document all the things so you do not need to learn the same thing twice.  
  • Strong knowledge of SDLC (System Development Life Cycle)  
  • Strong knowledge of git, Docker, Kubernetes, Jenkins, AWS (Amazon Web Services) or similar technologies  
  • Know what the use of configuration management systems like Chef, Ansible 
  • Have strong programming skills in one or more of the following languages: C, Ruby, Python, Java 
  • Good understanding of hybrid infrastructure 


  • Configuration management: experience with Chef and Ansible to effectively manage infrastructure 
  • Infrastructure as code: experience with Terraform and GitLab CI/CD for automation, containerize environments (Kubernetes), and leverage cloud technologies 
  • Systems: manage, configure, and troubleshoot operating system issues, storage (block and object), networking VPC (Virtual Private Cloud), proxies and CDN (Content Delivery Network) and administer high-availability PostgreSQL and Redis clusters 
  • Monitoring and instrumentation: implement metrics in Prometheus, Grafana, log management and related system, and Slack/PagerDuty integrations 
  • Engineering practices: availability, reliability, and scalability, as well as disaster recovery 
  • Use and contribute to code to git 
  • Experience coding in one or more of the following languages: C, Ruby, Python, Shell, Java 
  • Planning: familiar with agile methodologies; use epics and issues to drive projects 
  • Organization: workload organization, OKR (Objective and Key Result) leadership 
  • Management: a manager of one, able to self-organize and report asynchronously 
  • General knowledge of 4 technical expertise areas, with deep knowledge in 1 area:
    • AWS Cloud Practitioner, resources provisioning and configuration through CLI/API  
    • Chef (basic syntax, recipes, cookbooks) or Ansible (basic syntax, tasks, playbooks) 
    • Working knowledge of CI/CD, Jenkins, Nexus, pipelines, jobs 
    • Kubernetes basic understanding, CLI (Command Line Interface), service re-provisioning 
    • Provision and setup metric in AppD or Grafana or Datadog 
    • Provision and setup logs and queries for frequent questions 
    • Networking VPC, proxies and CDN (Content Delivery Network) 
  • Working knowledge of git
  • Mandatory: 5+ years experience and a BE/B.Sc

The S3 Difference

The global mission of S3 is to build trusting relationships and deliver solutions that positively impact our customers, our consultants, and our communities.  The four pillars of our company are to:

  • Set the bar high for what a company should do
  • Create jobs
  • Offer people an opportunity to succeed and change their station in life
  • Improve the communities where we live and work through volunteering and charitable giving

As an S3 employee, you’re eligible for a full benefits package that may include:

  • Medical Insurance
  • Dental Insurance
  • Vision Insurance
  • 401(k) Plan
  • Vacation Package
  • Life & Disability Insurance Plans
  • Flexible Spending Accounts
  • Tuition Reimbursement

Job ID: JOB-220503
Publish Date: 07 Mar 2023

Tagged as: Site Reliability Engineer (Onsite)