Senior Site Reliability Engineer Job at Broad Reach Partners, Alpharetta, GA

cTMvWDJFc2JRUG0vdVA2bDcycFRMRTc0SEE9PQ==
  • Broad Reach Partners
  • Alpharetta, GA

Job Description

Location: Hybrid (Alpharetta, GA – 3 days/week in office)
Type: Full-Time

We are seeking a Site Reliability Engineer to join our team and play in enhancing the stability, performance, and reliability of our production systems. You’ll work closely with development, DevOps, and security teams to improve observability, optimize system performance, and ensure production readiness. From monitoring to automation, you’ll make a direct impact on our cloud infrastructure and service reliability.

In this role, you will work hand-in-hand with our development, operations, and security teams worldwide to implement best practices, automate deployments, and ensure our platforms are reliable, secure, and scalable. Troubleshooting in Kubernetes requires deep understanding of pods, nodes, networking, scaling, logs, and service-to-service communication

This role requires a deep understanding of SRE best practices and a strong ability to troubleshoot complex issues.

Your responsibilities in this role will include:

  • Maintain and enhance monitoring tools (New Relic, Graylog) for service health and performance metrics.

  • Implement and maintain high-availability systems with capacity planning, performance optimization, and fault tolerance.

  • Define and monitor Service Level Indicators, Objectives, and Agreements with teams.

  • Deploy and manage Kubernetes workloads to AWS EKS(A) using Helm, ArgoCD

  • Automate operational processes to reduce manual interventions.

  • Manage Kubernetes workloads on AWS EKS for secure and stable deployments.

  • Participate in on-call rotation, troubleshoot production issues, and implement permanent fixes.

  • Work with DevOps to improve CI/CD pipelines and with development teams to embed resilience and observability.

  • Document operational runbooks, escalation procedures, and production playbooks.

We are looking for you to have the following skills and experience:

  • 8+ years of experience as a Site Reliability Engineer, or equivalent
  • Experience with tools like New Relic for monitoring and Graylog for logging.
  • 3+ years of experience with Amazon Web Services (AWS) or Microsoft Azure
  • 3+ years of experience with Kubernetes clusters - performance monitoring in Kubernetes.
  • Proficiency with public cloud environments (AWS preferred)
  • Proficiency in scripting language, like Bash, Groovy, Python
  • Excellent debugging and troubleshooting skills.
  • Ability to prioritize tasks efficiently and independently under minimal supervision.

Nice to Have

  • AWS Cloud certification
  • Familiar with .NET applications.
  • Knowledge in Terraform, Ansible, monitoring tools

This is a full-time role and we are unable to sponsor so you must be a USC or be a Green Card holder. We are working onsite a few days each week in our Alpharetta offices so you must live in Atlanta and within commuting distance of our office. If you thrive on solving complex technical challenges, have a passion for automation, and want to influence how enterprise platforms evolve and modernize, this is an ideal opportunity for you.

Ready to take the next step in your SRE career? Apply now and help us build the future of reliable systems!

Job Tags

Remote job, Full time, Live in, Work at office, Worldwide, 3 days per week,

Similar Jobs

Brady Martz

Accounting Specialist Job at Brady Martz

 ...The Strategic Business Solutions (SBS) Associate serves as an outsourced accountant for clients. They will work as a part of a team to provide a valuable client experience. They are tasked with handling the day-to-day accounting and finance functions for their clients... 

EyeCareCenter

Optometrist Job at EyeCareCenter

At Eyecarecenter, our optometrists focus on maintaining the health and development of our patient's eyes. Eyecarecenter is a proud partner of EyeCare Partners(ECP), a leading network of integrated ophthalmology and optometry providers serving patients across the entire... 

University of Miami

Clinical Faculty, Open Rank - Pathology and Laboratory Medicine, Anatomic, Breast Job at University of Miami

 ...using the Career worklet, please review this tip sheet . The University of Miami Health System is initiating a search for Pathology and Laboratory Medicine, Anatomic, Breast physician. Job Responsibilities ~ Patient care services, for patients at UMH (University... 

Rankings Io

Sr Social Media Manager - PIM Media Job at Rankings Io

 ...PIM Media is the creative engine of Rankings.io , home to Personal Injury Mastermind, PIMCON, a growing network of shows...  ...here. Were hiring a Senior Social Media Manager whos ready to own an...  ...a strategist who can also do the work. Youll concept campaigns, write compelling... 

HCS 247 Travel

Travel Cardiac Cath Lab Tech - $2,368 per week Job at HCS 247 Travel

 ...HCS 247 Travel is seeking a travel Cath Lab Technologist for a travel job in Port Charlotte, Florida. Job Description & Requirements...  ...including Respiratory Therapists, Surgical Technicians, X-Ray Techs, CT Techs, MRI Techs, Interventional Radiology (IR) Techs, Cath Lab...