Tech Council
Are you a tech leader, architect, or engineer navigating the intricacies of building within the enterprise? Tech Council delivers the strategies and insights you need to succeed. Hosted by Duncan Mapes and Jason Ehmke, experienced leaders from the startup and banking tech arenas, this podcast dives deep into technology strategy and enterprise dynamics. Learn how to drive innovation, understand the bigger picture, and build impactful solutions from the ground up. Subscribe to Tech Council and gain the knowledge to shape the future of your enterprise, no matter your role.
Tech Council
What Is SRE? Site Reliability Engineering Explained | Episode 19
Most companies are doing SRE wrong.
Hiring SREs doesn’t make you reliable. Metrics dashboards don’t guarantee accountability. And cultural change doesn’t happen because you wrote it on a slide deck.
In this episode, Duncan Mapes and Jason Ehmke push back against the misconceptions. They argue that SRE isn’t a bolt-on team but a systemic shift in how engineering works. Without shared accountability, meaningful metrics, and cultural buy-in, SRE will fail.
And no, copying Google’s model isn’t the answer.
If you think SRE is just a headcount play, this episode will challenge everything you believe. Got a different perspective? Drop us a review, share your comments, and send your toughest SRE questions our way.
Top Takeaways:
- SRE is a complex practice that varies across organizations.
- Defining SRE upfront can prevent chaos later.
- SRE is not just about taking over responsibilities; it's about collaboration.
- The role of SREs is to guide and support application teams.
- Key metrics for SRE success include mean time to detect and restore.
- Cultural transformation is essential for successful SRE implementation.
- Finding early wins can help demonstrate the value of SRE.
- Effective communication is crucial for SREs to succeed.
- SRE teams should focus on toil reduction and automation.
- Building a strong relationship between SREs and app teams is vital.
Mentioned in this Episode:
Site Reliability Engineering: How Google Runs Production Systems - https://www.oreilly.com/library/view/site-reliability-engineering/9781491929117/
Connect with us: