Senior Site Reliability Engineer

Production Oper/Support IT · Mountain View, California
Department Production Oper/Support IT
Employment Type Full-time
Minimum Experience Experienced

It’s truly an exciting time to be a part of GetInsured. Our vision has always been to make finding and enrolling in health insurance simple. GetInsured currently has the largest state-based marketplace footprint, and our consumer-friendly interface and decision support tools empower millions of consumers across the country to make better health plan decisions. GetInsured builds and operates award-winning cloud-based enrollment tools that serve state-based exchanges, brokers, insurers, and consumers. In addition to eligibility determination, plan selection, and enrollment technology for state agencies, the company delivers innovative agent marketing and call center tools and services. 


Our operations stack includes Cloudflare, HAProxy, Tomcat, node.js, Postgres, Couchbase, Solr, and Redis running on CentOS on VMWare.  We have multiple data centers in Rackspace, Azure, and AWS as well as on-prem VMWare.  Our tools include Puppet, Jenkins, Splunk, icinga, PagerDuty, and the various Atlassian services such as Jira, Confluence, and BitBucket.


We have been strongly focused on DevOps cultural and organization changes for several years now and have seen great success.  We still have much to do, so we are looking for a Site Reliability Engineer AKA DevOps Engineer to join our team to continue to build out our infrastructure, bring in new tools, and improve our processes.


Responsibilities


  • Identify new technologies, tools, and processes.  Actively pursue learning and prototyping.
  • Identify, diagnose, and resolve complex technical issues efficiently in live production environment and drive to quick resolutions – as well as – leverage those events to improve current technology and processes towards prevention of such issues.
  • Help ensure that production systems are always up and running.
  • Work closely with the Engineering team to escalate issues for triage and resolution.
  • Routinely review tickets and diagnostics to identify trends/chronic issues then put processes and tool in place to prevent problems.
  • Hands-on implementation and upgrade of tools for monitoring.
  • Audit proactive monitoring of all systems to detect and resolve problems to ensure uninterrupted operation of all infrastructure systems.


Requirements


  • Strong background in Linux/Unix administration.
  • Strong technical systems and application operations/release management experience with a passion for troubleshooting and triage of incidents, bringing issues to rapid resolution.
  • Experience with automation/configuration management using either Puppet, Chef or an equivalent.
  • Knowledge of Jenkins and Java builds is a plus
  • Knowledge of AWS is a plus
  • Ready & willing to participate in production systems support.
  • Ability to use a wide variety of open source technologies and cloud services.
  • Good experience with SQL and with Postgres or similar RDBMS.
  • Good experience with networking.
  • Good understanding of code and Bash scripting.
  • Knowledge of best practices and IT operations in an always-up, always-available service
  • 5+ years of experience working in operations.
  • Experience working in a DevOps group environment.
  • Permanent resident or US Citizenship required


GetInsured offers very competitive benefits, including


  • Competitive compensation
  • 401(k) matching
  • Robust health benefits
  • Tight-knit team and an open working environment 

Thank You

Your application was submitted successfully.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

  • Location
    Mountain View, California
  • Department
    Production Oper/Support IT
  • Employment Type
    Full-time
  • Minimum Experience
    Experienced