Wish this had a legend. Obviously the box sizes and positioning mean something.
What I can see makes me a bit curious.
- Frontend Web boxes are "big", where the (some) of the backend App boxes are small. I'd have though it would be the reverse -- frontend horizontally scaled first, backend vertically scaled first.
- Sometimes they use Zones ABC, sometimes AB. This is cool, but a lot of the "front" infrastructure is AB (including www), so not sure what advantage having ABC on backend pieces are. These are obviously super-critical for some special reason? I guess they also might be pieces that are-not/only-partially replicated to the secondary site.
- The failover US-West site is using Asgard along with Puppet, but the primary US-East one isn't. I guess this is for managing failure scenarios?
- Prodigious use of a lot of AWS services. Including internal ELBs all over the place. SQL Server and PostgreSQL sneak in a few places.
- Couldn't find the CDN!
- Looks like staging and testing are a complete replica (hard to tell, the resolution isn't quite there). They're big though. This is fine, but raises the question why the secondary site is just a part replica? If you provision the lot to stage, you'd figure you'd run the secondary the same way?
- The Data Warehouse runs on the secondary site, but it only accessed from the primary. Interesting. Wonder why they just didn't put it on the primary?
Box sizes: this system supported lots of different programming languages/framewoerks/pre-built OSS, and as a result, some parts ran nice and lean (python) others chewed memory with reckless abandon (magento). Another factor was supplying enough network bandwidth to some hosts, hence larger sizes.
Zones: most apps were built for 2 AZ's at the start; apps deemed "critical" and "doable" flipped to 3 near the election.
Asgard: deployments had been tested with Asgard in East, but the rapid deploy in west actually used this approach. Thanks Netflix!
Services: some which are critical but missed the chart are IAM, sns, cloudwatch (for autoscaling), and yes, a bit of cloudfront was used on images sent out on SES (transactional only!) emails.
What I can see makes me a bit curious.
- Frontend Web boxes are "big", where the (some) of the backend App boxes are small. I'd have though it would be the reverse -- frontend horizontally scaled first, backend vertically scaled first.
- Sometimes they use Zones ABC, sometimes AB. This is cool, but a lot of the "front" infrastructure is AB (including www), so not sure what advantage having ABC on backend pieces are. These are obviously super-critical for some special reason? I guess they also might be pieces that are-not/only-partially replicated to the secondary site.
- The failover US-West site is using Asgard along with Puppet, but the primary US-East one isn't. I guess this is for managing failure scenarios?
- Prodigious use of a lot of AWS services. Including internal ELBs all over the place. SQL Server and PostgreSQL sneak in a few places.
- Couldn't find the CDN!
- Looks like staging and testing are a complete replica (hard to tell, the resolution isn't quite there). They're big though. This is fine, but raises the question why the secondary site is just a part replica? If you provision the lot to stage, you'd figure you'd run the secondary the same way?
- The Data Warehouse runs on the secondary site, but it only accessed from the primary. Interesting. Wonder why they just didn't put it on the primary?