Learning from Incidents • Andrew Hatch • YOW! 2019
This presentation was recorded at YOW! 2019. #GOTOcon #YOW
Andrew Hatch - Head of Platform Engineering at Seek
RESOURCES
@hatchman76
ABSTRACT
Aligning to a #DevOps #culture has seen many organisations gain a distinct competitive advantage in their marketplace - especially if they started changing their thinking early which Seek did. Frequent daily deployments, #teams owning what they build, the ability to iterate and deliver Products faster, and a greater emphasis on collaboration with much less of “that’s not my job“, has achieved many benefits. But there is flipsides to this rapid rate of change, and depending on your perspective, how you capitalise on it could be the next big advantage you can take.
When teams gain greater #autonomy to make technology choices the amount of diversification in your enterprise grows rapidly - especially when you are on the bleeding edge of what the major #cloud providers are releasing. This increase in diversification will place greater #CognitiveLoads on the people operating and building the system, to a point where an ability to mental model your systems becomes impossible. Incidents and failure will still be a part of normal system functions, still just as complex, but more asynchronous and therefore more difficult to diagnose the reverberations of failure through the system. How you embrace failure in this greater field of diversification, learn from it and use it, is what will set you apart.
This presentation will discuss how Seek has dealt with and collated extensive amounts of data on “Normal Accidents“ over the last several years. We will demonstrate how incident analysis and involvement of teams in post-mortem rituals, has paved the way to many starting viewing our diverse software stack as the #SocioTechnicalSystem it is, and how appreciating the #HumanFactor elements of #incidents are important to building greater resiliency in the system. We will discuss how involvement of technology people in #incident investigation and facilitation will lead to richer amounts of data, that can be fed back into the delivery cycle and continuously improve the reliability and resiliency of your products We will also discuss the traps and pitfalls to avoid such as obsessing over the Root Cause and why the “5 Why’s” technique of incident analysis can be flawed. [...]
RECOMMENDED BOOKS
Forsgren, Humble & Kim • Accelerate: The Science of Lean Software and DevOps •
John Arundel & Justin Domingus • Cloud Native DevOps with Kubernetes •
Wynne, Hellesoy & Tooke • The Cucumber Book •
Eric Ries • The Lean Startup •
#AndrewHatch #SoftwareEngineering #Programming #YOWcon
Looking for a unique learning experience?
Attend the next GOTO conference near you! Get your ticket at
Sign up for updates and specials at
SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
1 view
0
0
6 days ago 00:02:36 1
Five Little Elves | Christmas Song For Kids | Super Simple Songs
6 days ago 00:03:02 1
Milk & Cookies | Holiday Song for Kids | Rhymington Square
6 days ago 00:02:36 5
Jingle Bells | Christmas Song | Super Simple Songs
1 week ago 00:03:02 1
SPX Options Trading : Strategies for Big Gains!
1 week ago 00:13:15 1
BREAKING: MASS EXODUS Of Soldiers Rock IDF After BLOODIEST DAY EVER in Lebanon
2 weeks ago 00:03:31 1
The Hobbies Song for Kids | What Do You Like to Do? | Fun Kids English
2 weeks ago 00:00:22 1
Do it! Other ways to say Daily English speaking practice English conversation
2 weeks ago 00:23:23 1
Create CONSISTENT CHARACTERS from an INPUT IMAGE with FLUX! (ComfyUI Tutorial + Installation Guide)