Thanks for the thoughts Geoff — don’t you think however that the majority of the problems experienced as part of the fault were due to perception and expectation management on the part of AWS ? I get what you are saying about multi-region — that IMO should be a given even for smaller organisations as the cost barrier these days isn’t actually THAT significant — the technology one is often he show stopper in many cases.
I guess the point i take issue with, is that the impact of this issue .. the one that users world-wide felt was due to a lack of ‘shared’ responsibility from a HA/Availability perspective. I obviously disagree.
Just look at the blast radius of the problem — I saw reports of Lambda, EC2, EBS, API Gateway and pretty much every other AWS service showing issues. It’s clear to me why — because they all have a dependency on S3 — and because i’m a little OCD and obsess over such things, i know that. But lets say i wasn't OCD … maybe i wanted to go to the AWS status page — which also had a S3 dependancy directly and indirectly….or not ?
Also — given that AWS have a selling point that they manage the undifferentiated heavy lifting — i question the wisdom of not making it VERY CLEAR that services such as Lambda have an S3 dependancy — this being the biggest example of this point IMO.
When considering site availability there is a cross-region element, but we also have a service isolation element and I think that most of the blame — technical and communication wise needs to be levelled at AWS in this case.
What do you think ?