Tech Council

How to Build Resilient Systems in Complex Enterprises | Episode 26

Duncan Mapes, Jason Ehmke Episode 26

When systems fail, it’s rarely because no one saw it coming. It’s because no one planned for it. 

In this episode, Duncan Mapes and Jason Ehmke share real-world lessons from years of building and scaling technology across enterprise environments where downtime costs dollars.

They explore the art of designing resilient systems that can withstand inevitable failure points, recover quickly, and continue operating under pressure. From team culture to proactive design checklists, this conversation dives into how engineering leaders can turn system reliability into a competitive advantage.

Top Takeaways:

  • Designing for failure is crucial in system architecture.
  • Understanding failure points is essential for system resiliency.
  • Resiliency can mean different things depending on the context.
  • Asking the right questions during project kickoff is vital.
  • Complex enterprise environments have unique challenges.
  • Responsibility for failures should not be shifted to others.
  • Handling app stability is a core responsibility of developers.
  • Everything in a system can fail at some point.
  • Evaluating business impact is crucial for prioritizing resiliency efforts.
  • Creating a resiliency checklist can guide design and implementation.


Connect with us:

Duncan Mapes

Jason Ehmke

DevGrid.io

DevGrid on LinkedIn

DevGrid on X