Site Reliability Engineer Job at iCIMS, Holmdel, NJ

cEgvWDNVZ2VRdk95dS95ajZHZFNMRXowRnc9PQ==
  • iCIMS
  • Holmdel, NJ

Job Description

Job Summary

We are seeking a skilled Engineer, Site Reliability (SRE) to contribute to the reliability, scalability, and performance of our multi-cloud SaaS platform serving thousands of customers worldwide. This role involves hands-on technical work in incident response, system monitoring, automation, and continuous improvement of our platform reliability. The successful candidate will work within a global SRE team to ensure optimal system performance and customer satisfaction.

Responsibilities

  • System Monitoring & Reliability:
    • Monitor multi-cloud infrastructure (AWS, Azure, GCP) using New Relic, Grafana, and Sumo Logic
    • Maintain reliability of AWS resources, Auth0/Okta authentication, databases, and legacy applications
    • Implement monitoring, alerting, and dashboards for assigned systems
  • Incident Management & Response:
    • Respond to alerts and incidents within SLA timeframes
    • Perform root cause analysis and document findings
    • Create and maintain runbooks and troubleshooting procedures
    • Participate in 24/7 on-call rotation
  • Automation & Improvement:
    • Develop scripts to reduce manual operational overhead
    • Build monitoring and alerting solutions
    • Support infrastructure-as-code initiatives
    • Implement automated remediation where possible
  • Success Metrics:
    • Customer Impact : Reduced MTTR and improved customer satisfaction scores
    • Reliability : Achievement of 99.9%+ uptime SLAs across all products and regions
    • Proactive Prevention: Reduction in incident frequency through automated detection and prevention
    • Cross-functional Collaboration: Improved partnership metrics with Product, Engineering, and Customer Success teams
    • Automation Delivery: Complete assigned automation projects to reduce manual tasks
    • Knowledge Sharing: Contribute to team knowledge base and mentor junior engineers

Qualifications

  • 4+ years experience in SRE, DevOps, or Infrastructure Engineering
  • Hands-on experience with AWS (required) and Azure (preferred)
  • Strong Linux system administration skills
  • Experience with monitoring tools (New Relic, Grafana, Prometheus)
  • Scripting skills in Python, Bash, or similar
  • Knowledge of databases (SQL Server, PostgreSQL, MongoDB)

Job Tags

Worldwide,

Similar Jobs

US Veterans Health Administration

Recreation Assistant Job at US Veterans Health Administration

 ...System - Alvin C. York VA Medical Center in Murfreesboro, TN. The Recreation Assistant is responsible for implementing the objectives of the...  ...and activities; (e) Summer aid in a State or national park, with duties related to the recreational activities of visitors... 

CN Guidance and Counseling Services

Billing Specialist Job at CN Guidance and Counseling Services

 ...If you're detail-oriented and experienced in the medical billing field, then CN Guidance and Counseling Services has the perfect career opportunity for you! We're hiring a full-time Billing Specialist to oversee medical accounts receivable matters and coordinate with... 

SUNNY DISTRIBUTOR INC.

Fitness App Casting for Personal Trainers, Fitness Instructors, Coaches Job at SUNNY DISTRIBUTOR INC.

 ...views, its about YOU doing what you do! Personal Requirements: Enthusiastic, outgoing...  ...group fitness coach / instructor or personal trainer (preferred & open to equivalent...  ...ambassador and engaging with a social community online (preferred). ~ Required Media: Headshot... 

Amergis

Remote Medical Coder Job at Amergis

The Medical Coder is responsible for assigning ICD-10 and/or CPT/HCPCS codes as appropriate, and abstracts pertinent information from patient records.Minimum Requirements...  ...preferred minimum of 2 years relevant coding experience+ Must be at least 18 years of age Benefits... 

White Force Outsourcing Pvt Ltd

Helper Job at White Force Outsourcing Pvt Ltd

 ...A helper's involves assisting skilled workers, performing manual labor, and maintaining a clean workspace . Key duties include transporting materials, organizing tools, cleaning work areas, and following instructions to ensure operations run smoothly. The specific...