Title: Site Reliability Engineer
Location: Falls Church, VA
Salary: $110,000 - $130,000 / Year
Job Type: Full-Time | Exempt
No sponsorship available
BENEFITS
• Health, Dental, Vision Insurance
• 401(k) with immediate vesting
• Tuition Assistance
• Public Service Loan Forgiveness (PSLF) eligibility
• Generous Paid Time Off
• Dog-friendly office
• Onsite gym
• Health Savings Account (HSA) / Flexible Spending Account (FSA)
• Employee Assistance Program (EAP)
• Life and Disability Insurance
• Pet Insurance
• Trade Publication / Subscription Reimbursement
• Paid Holidays, Vacation, and Sick Leave
• Parental Leave
Job Description
We are seeking a Site Reliability Engineer (SRE) to help establish and shape a reliability engineering practice from the ground up. This is a unique opportunity to join a mission-driven environment and play a key role in ensuring the reliability, scalability, and performance of AWS-hosted business applications.
As part of a cross-functional engineering team, you will work to improve observability, automate operational processes, and lead incident response and continuous improvement efforts. This role is ideal for a mid-level engineer with cloud and software engineering experience who is eager to deepen their expertise in site reliability engineering, learn from senior staff, and help build a culture of reliability.
ESSENTIAL DUTIES AND RESPONSIBILITIES
• Define and implement service-level indicators (SLIs) and service-level objectives (SLOs) for cloud-based applications.
• Build, configure, and maintain monitoring, alerting, and dashboarding solutions using AWS CloudWatch, X-Ray, and third-party tools such as DataDome.
• Leverage advanced AWS observability tools (e.g., CloudWatch Synthetics, Contributor Insights) to proactively monitor system health.
• Contribute to the development and implementation of a structured on-call support process.
• Implement, monitor, and maintain site protection and bot mitigation solutions to defend against automated attacks and ensure application availability.
• Investigate incidents, security events, and operational anomalies, perform root cause analysis, and lead postmortem processes.
• Identify operational inefficiencies (“toil”) and automate workflows using AWS Lambda and CloudFormation.
• Assist in maintaining and enhancing CI/CD pipelines and deployment processes.
• Collaborate with development, QA, cloud, and DevOps teams to ensure reliability, scalability, and security are embedded into system designs.
• Document systems, processes, incident findings, compliance activities, and reliability best practices.
• Stay current with AWS, SRE, and observability trends and recommend improvements.
• Evaluate and support the rollout of new AWS services and features.
• Perform other related duties as assigned.
KNOWLEDGE & SKILLS
• Strong analytical, troubleshooting, and problem-solving abilities.
• Hands-on experience with AWS CloudWatch (metrics, logs, dashboards, alarms).
• Familiarity with AWS X-Ray for distributed tracing.
• Experience with CloudWatch Synthetics and Contributor Insights for proactive testing and analysis.
• Knowledge of AWS CloudTrail for auditing and investigations.
• Experience using AWS Athena for log analysis.
• Proficiency with AWS CloudFormation.
• Experience automating workflows with AWS Lambda or similar tools.
• Understanding of AWS services such as API Gateway, CloudFront, and Elastic Load Balancer (ELB).
• Experience with site protection or bot mitigation tools (e.g., DataDome, Cloudflare).
• Scripting or programming experience in Python, Bash, or Node.js.
• Excellent communication and documentation skills.
• Growth-oriented and eager to adopt emerging tools and practices.
REQUIREMENTS
• Bachelor’s degree in computer science, engineering, or related field (or equivalent experience).
• 3+ years of experience in cloud engineering, DevOps, infrastructure, or observability (AWS required).
• Experience applying SRE principles (prior SRE experience preferred).
• Background in monitoring, incident response, or reliability in production environments.
• Experience working in Agile, cross-functional teams.
• Passion for building and improving reliability practices.
The Clinical Research Assistant performs a variety of research, data and clerical duties of a routine and technical nature to support the conduct of clinical research under the direction of a senior research team member. The Research Assistant will develop a progressive...
...career. Royal Caribbean Groups Revenue Team has an exciting internship opportunity within the team, see below. Applications will be reviewed... ...with Royal Caribbean Group! About the Department: The Finance Department is responsible for tracking, planning, and...
...ready to contribute to a high-performance team and support the companys long-term success. Job Description The Brand Promoter will support our marketing and outreach initiatives by representing partner brands with professionalism and enthusiasm. This role...
...Join Our Team as a Day Porter Janitor! Are you someone who takes pride in maintaining clean and welcoming spaces? At Keeping it Kleen , located in Norfolk, VA, were dedicated to providing top-tier cleaning services that make a difference. Were looking for a dependable...
...looking for 4 Inpatient Certified Coders role our large healthcare client... ...accurate diagnosis and procedure codes for complex cases. This position is 100% remote and part of the clients centralized... ...(Certified Coding Specialist)-Experience with EPIC, Optum AS (inpatient...