Hardware is cheap. People are expensive. Besides that, procuring resources with ...

AnthonyMouse · on Dec 20, 2020

> Hardware is cheap. People are expensive.

Except that you still need the people, because most of the labor isn't putting the hardware in the rack, it's managing the software which you have to do regardless of where the hardware is.

> Besides that, procuring resources with your cloud provider is simply a matter of writing a yaml file.

That is no different than it is locally.

> Not to mention the lack of an upfront investments and only paying for your resources you need instead of having hardware that you are spending money on because you have to have enough hardware to handle peak load.

But hardware is cheap, remember? And most companies don't actually have large load variations.

> But can you run a data center in multiple regions?

Obviously yes. Any company of non-trivial size would have multiple sites and could locate a host at more than one. This doesn't even necessarily raise the price, because you already need enough machines to provide redundancy, so locating them at different sites doesn't even require additional hardware, only locating some of the existing hardware at other sites.

This is also mostly overrated for companies smaller than that, because cloud providers have had company-wide outages at a frequency not all that much higher than site-wide outages for sites that have a reasonable level of redundancy.

> Besides that any cloud provider offers more than just a bunch of VMs. AWS alone has 260 services with an entire team of people keeping them patched and optimized.

This is only relevant if you're using 260 different services and not just a bunch of VMs, and plenty of companies are using just a bunch of VMs.

mlthoughts2018 · on Dec 20, 2020

As a manager of a team of application product developers I can tell you, the headcount cost of ops teams & the time cost of taking people whose job shouldn’t involve vm provisioning overhead but nonetheless does are both huge compared to cloud services. In cloud tooling, my team of people all with zero experience doing vm provisioning can get production systems up, add logging, add alerting, add networking, etc., all very easily or with just low touch overhead from teams that manage best practices or compliance. Creating the same internal developer tool experience with data centers is SO expensive and requires a major headcount investment.

AnthonyMouse · on Dec 20, 2020

It's always shocking to see how inefficiently some companies are operated.

The things you're describing should take a single individual a matter of seconds for a system which is already in operation, and a one-time cost of a few hours to set up at the outset (i.e. once or twice a decade). If it takes significantly longer to do locally than it takes to input into the cloud provider's interface, something's not right.

I can tell you where most companies go wrong here though. It's in excessive specialization. If you put separate people in charge of provisioning, networking, logging, etc. then you create a ton of friction to do anything because you need five different humans to touch it and they all have to coordinate. One person can do all of those things, as you've learned when one person does it interacting with the cloud providers. And when one person is doing all of it, it takes only seconds to do.

scarface74 · on Dec 20, 2020

I listed all of the services we used across five environments and most across three availability zones. So one person was going to manage what on prem would be roughly 200 VMs/services and make sure they stay patched, the OS stays updated? Locally, a lot of those services would run in a cluster for availability.

There is no company on earth that has one person managing an on prem implementation of that level of complexity.

One person does it in a cloud environment because they aren’t managing hardware, patching, etc.

AnthonyMouse · on Dec 21, 2020

> So one person was going to manage what on prem would be roughly 200 VMs/services and make sure they stay patched, the OS stays updated?

You're talking about the applications, i.e. the guests, not the hosts. It should be completely reasonable to expect a single person to be able to manage the hosts, including the networking for the hosts, the backups of the guest images by the hosts, etc. The services themselves may be arbitrarily complicated and require a large number of people to configure and manage, but that's the same as it is with cloud providers.

There is also a simple way to handle guest OS updates in most cases. Turn on automatic updates, and keep automated VM snapshots in case one goes bad.

cinquemb · on Dec 21, 2020

Many devs don't want to do any of that, they would rather have everything run in docker and not worry about any of the above (which if you can afford it at scale it can make sense, personally I'm not of that ethos and try to stay away from companies like that, as its great for me personally to work in a place where the devs can jump in anywhere rather just be tied down in their silo).

I think its easier for some companies to just pay amazon for all these services than to hire engineers be able to do what you are talking about. For a certain type of company (well established, known resource loads) it can make sense now (though risk huge lock-in costs and wont be able to move fast enough when they arrive), but I've seen/worked at many start ups that try to do the full cloud setups above and just get blown out from just costs alone.

The last company I worked at (south east asia start up, with just me originally as the dev, but grew to about 5 more devs when I left) we only had s3, cloudfront (though we put cdn77 infront of it cause bandwidth in SEA is a rip off with aws), and rds with postgres (and I wanted to move away from this cause backups and resizes from sql dumps are a PITA and take way too long when you have to do it over the network vs on the instance) and ran every other service we needed on ec2 instances (load balancer/auto scaling with nginx+3rd party modules + python daemon with boto3 api, rabbitmq, solr, memcached, dnscache, pgbouncer, letsencryot, deployments to prod/staging/one off envs for random stuff with fabric building what was needed on a workbench then spin up instances from an AMI image [1/4 the cost compared to using docker, with init scripts that turned off/on applications depending on what the server was need for] to about 40+ machines at max site load]) and had to support 40+ customer domains which overall was 8-10x cheaper than the "all-in" AWS abomination an AWS consultant cooked up before we went that route.

scarface74 · on Dec 21, 2020

No, with the cloud providers - they manage the guests. You don’t have to know anything about how to manage the software. The difference as far as “turning on automated updates” and what your cloud provider does. Is that they manage the updates and have an entire team to test it.

Most of the services I named are just that services - you don’t even think about the underlying process.

It literally takes two clicks for instance to stand up a massively parallel data warehousing solution on AWS or an Hadoop cluster. It’s a few lines of yaml to set up most of it.

It didn’t take a “large number of people” to manage the software. It took two full time people - one in the US and one in India. For the most part, there is very little to manage or configure. Everything is provisioned with a set of yaml files.

Also, what happens when that one person gets hit by the lottery bus or goes on vacation?

scarface74 · on Dec 20, 2020

For context. My first exposure to the cloud was at my last company of 100 employees. We aggregated publicly available (ie no PII) health care provider data from all 50 states and government agencies as well as various disease/health dictionaries and we combined it with data sent to us from large health systems.

These are the services we used.

Infrastructure

- Route 53 (DNS)

- SQS/SNS (messaging)

- Active Directory.

- Cognito (SAML/SSO for our customers)

- Parameter Store/DynamoDB (configuration)

- CloudWatch (logging, monitoring, alerts, scheduling)

- Step functions (orchestration)

- Kinesis (stream processing). We were just introducing this when I left. I’m not sure what they were using it for.

CI/CD

We used GitHub for source control.

- CodePipeline (CI/CD orchestration)

- CodeBuild (Serverless builds. It would spin up a Windows or Linux Docker container and basically run PowerShell or Bash commands)

- self hosted OctopusDeploy server.

Data Storage

- S3 (Object/File storage)

- Redshift (OLAP database)

- Aurora/MySqL (OLTP RDMS). When we had large indexing to do to ELasticSearch, Read Replicas would autoscale.

- ElasticSearch

- Redis

Data Processing

- Athena (Serverless Apache Presto processing against S3)

- Glue (Serverless PySpark environment)

Compute

- EC2 (Pet VMs and one autoscaling group of VMs to process data as it came in from clients. It ran a legacy Windows process)

- ECS/Fargate (Serverless Docker cluster)

- Lambda (for processes where we needed to scale from 0 to $alot for backend processes)

- Workspaces (Windows VMs hosted in the US as Dev machines for our Indian Developers who didn’t want to deal with the latency.)

- Level 7 load balancers

Front end

- S3 (hosted static assets like html, JS, CSS. You can serve S3 content as a website.)

- CloudFront (CDN)

- WAF (Web Application Firewall)

All of the above infrastructure was duplicated for five different environments (DEV, QAT, UAT, Stage, Prod). In Prod, where needed, infrastructure was duplicated in multiple available zones (not regions).

Where applicable, backups were automated.

We had two full time operations people. The rest was maintained by developers. ——- as far as the rest.

> [Procuring resources] is no different than it is locally.

I can go from no infrastructure to everything I just named in a matter of hours locally? I can set up a multi availability zone Mysql database with automated backups just by running a yaml file locally and then turn it off when not needed?

AnthonyMouse · on Dec 20, 2020

Most of what you're listing are Layer 7 services. The time cost there is in the configuration. You can put Active Directory in the cloud, but it's still going to be Active Directory, i.e. a massively complicated proprietary framework that touches every Windows system in your network like an octopus.

And some of those things actually make sense. You can't really locally host a CDN, can you? If you need a big amount of compute for an hour and then never again, it doesn't make much sense to buy hardware for that.

But the point isn't that it never makes sense to put anything in the cloud at all. It's that companies regularly overuse it as some kind of buzzword panacea when there are only a specific set of things that it's actually good for.

scarface74 · on Dec 20, 2020

It’s not just “configuration”. There is also the issue of continuous monitoring and upkeep. Not to mention someone has to worry about servers going down, hard drives going bad, backups. Would any one person know how to configure and maintain everything above?

I’m a developer who happens to have AWS in my toolbelt. I could set all that up by myself. In the the two years that I worked there, we never had an issue with any of it.

How much in house expertise would we have had to hire to manage everything that we used?

AnthonyMouse · on Dec 21, 2020

> There is also the issue of continuous monitoring and upkeep. Not to mention someone has to worry about servers going down, hard drives going bad, backups.

Monitoring and backups you configure once and then they're automated. Disk failures happen maybe a couple times a year and take five minutes to stick in another disk. None of this is particularly labor intensive.

> Would any one person know how to configure and maintain everything above?

It has long been a fact of life in companies small enough to have a single-person IT department.

scarface74 · on Dec 21, 2020

Disk failures only happen once or twice a year when you have an infrastructure required to support the infrastructure the size I mentioned if you were running on prem - across multiple geographically dispersed locations? How many people are you going to have to hire to make sure it’s running? Patched? Will one person know how to optimize the 25 services we used? All of those services are redundant across multiple servers.

As far as the backups, how long would it take you to recover from a disaster? Do you have a team of people who are experts on all 25 different surfaces?

It was a pain when I did my first and only on prem from scratch infrastructure that consisted of multiple environments just running Consul, Fabio, Nomad, Vault, Mongo, Memcached and Sql Server. We also had on prem builds and deployments using agents orchestrated by Visual Studio Team Services (now called Azure Devops, basically TFS online.)

The prod environment was running clustered for all of the services. I left off my original list Amazon Certificate Manager that automatically provisions and renews SSL certificates. The IT department had to keep up with SSL certificates.

Our on prem infrastructure was a pain to manage and didn’t have anywhere near the reliability. I know we didn’t keep everything up to date and patched.

Backups were a pain to manage and no one ever verified if they worked.