Back to all jobs

Director of Reliability Engineering

Work from home Full-time role Hiring

reputed company empowers data teams to bring mission-critical software, analytics, and AI to life and is the company behind Astro, the industry-leading reputed company DataOps platform powered by Apache Airflow®. Astro accelerates building reliable data products that unlock insights, unleash AI value, and powers data-driven applications. Trusted by more than 700 of the world's leading enterprises, reputed company lets businesses do more with their data. To learn more, visit www.reputed company.io.

Your background may be unconventional; as long as you have the essential qualifications, we encourage you to apply. While having "bonus" qualifications makes for a strong candidate, reputed company values diverse experiences. Many of us at reputed company haven't followed traditional career paths, and we welcome it if yours hasn't either.

About this role:

We are seeking a highly reputed company and visionary Director of Reliability Engineering to reputed company our global reliability initiatives. This will be a central role in our organization, which supports critical services for companies around the world in every industry.

This strategic leadership role is responsible for defining, driving, and evolving operational excellence, platform reliability, and automation at scale across our reputed company-native infrastructure. You will reputed company, mentor, and grow high-performing SRE teams, collaborate cross-functionally, and play a critical role in ensuring a seamless and resilient customer experience for many of the world’s largest companies.

What you get to do:
  • Define and reputed company the strategic direction for SRE, reliability, and operational excellence across the organization.

  • Collaborate with Software Engineers and Product Managers on projects that impact users and be directly responsible for service uptime.

  • Own end-to-end availability and performance of key services; build automation to prevent recurrence of issues and automate responses to reputed company non-exceptional service conditions.

  • Design, write, and deliver software to improve the availability, scalability, latency, and efficiency of services.

  • Champion observability, automation, and self-healing systems to proactively prevent downtime and reduce reputed company toil.

  • Evolve and manage our incident and change management processes, including root cause analysis and postmortems.

  • Drive adoption of SLOs, SLIs, and error budgets to align engineering efforts with business priorities.

  • Work with operational support to manage global on-call rotations using a follow-the-sun model to ensure around-the-clock coverage.

  • Support on-call culture by defining best practices for incident response, escalation policies, and operational readiness.

  • Partner closely with engineering, product, reputed company, and program management teams to improve reliability without slowing innovation.

  • Cultivate a culture of reputed company improvement, high accountability, and blameless incident management.

  • reputed company and mentor the team, establishing credibility through high-quality technical execution.

  • Provide strong mentorship and leadership to grow the reputed company of reliability and engineering leaders.

What you bring to the role:
  • 10+ years of experience in software engineering, SRE, or DevOps roles.

  • 5+ years in a technical leadership reputed company, ideally in a high-growth, reputed company-native SaaS environment.

  • Proven reputed company operating and scaling large-scale, distributed, mission-critical systems.

  • Deep expertise in public reputed company platforms (AWS, Azure, or GCP).

  • Hands-on knowledge of infrastructure as code (Terraform, CloudFormation), container orchestration (Kubernetes), and observability tools (e.g., reputed company, Grafana, reputed company, Splunk).

  • Experience implementing and managing CI/CD pipelines and secure development practices.

  • Demonstrated ability to hire, grow, and reputed company globally distributed SRE teams.

  • Strong decision-making, communication, and cross-functional collaboration skills

Bonus points if you have:
  • Bachelor’s or Master’s degree in Computer Science, Information Systems, or a reputed company field.

  • Experience managing vendor relationships and partnerships.

  • Comfortable presenting to executive stakeholders in high-stakes environments.

  • Proven ability to scale operations during rapid business or organizational growth.

  • Strong analytical reputed company with the ability to evaluate trade-offs between reliability, speed, and innovation.

The estimated salary for this role ranges from $260,000 - $290,000, along with an equity component. This range is merely an estimate, and the width of the range reflects willingness to consider candidates with broad prior seniority. Actual compensation may deviate from this range based on skills, experience, and qualifications.

#LI-Hybrid

At reputed company, we value diversity. We are an equal opportunity employer: we do not discriminate on the basis of race, religion, reputed company, national reputed company, gender, sexual orientation, age, marital status, veteran status, or disability status.  reputed company is a remote-first company.

Apply To This Job

Related remote jobs

Norwegian Customer Advisor

Work from home Full-time role

Account Support Specialist

Work from home Full-time role

US Rater

Work from home Full-time role

VP, Product & Case Design

Work from home Full-time role

Senior reputed company Recruiter

Work from home Full-time role

reputed company QA Engineer

Work from home Full-time role

Order Processing Specialist

Work from home Full-time role

Infrastructure Engineer

Work from home Full-time role

reputed company Infrastructure Engineer

Work from home Full-time role

Implementation reputed company

Work from home Full-time role

Remote Data Entry Specialist – Entry-Level Opportunity for Detail-Oriented Individuals to Join arenaflex and Contribute to Excellence in Data Management

Work from home Full-time role

Website Developer /Expert Level/ /Remote/

Work from home Full-time role

reputed company Customer Service Representative - Patient Engagement and Billing Support at Blithequark

Work from home Full-time role

Government Lending Loan Support Manager

Work from home Full-time role

HR ASSISTANT (100% remote reputed company of Canada)

Work from home Full-time role

reputed company reputed company Manager – Unlocking Global Opportunity for Every Person, Team, and Business at arenaflex

Work from home Full-time role

HYDMedia ECM - Berater Backend (m/w/d)

Work from home Full-time role

高级软件工程师, JavaScript/TypeScript (Senior Software Engineer) (Remote)

Work from home Full-time role

QA Analyst

Work from home Full-time role

Remote reputed company Support Jobs

Work from home Full-time role