> the large cloud providers have clearly indicated by their actions that they’re simply not interested in implementing hard cost circuit breakers.
I agree, my term for this is “bad faith”.
I recently had a free $200 credit for Azure. I setup their default MariaDB instance for a side project, figuring I’d get my feet wet with Azure. I didn’t spend time evaluating the cost bc I figured, how much could the default be if I haven’t cranked up the instance resources at all? Turns out the answer is more than $10/day which I discovered when authentication failed to my test DB. Back to Digital Ocean.
My term for it is “you’re not their use case.” For better or worse, they’ve prioritized usages that would much rather have an unexpected few thousand dollar bill than have services paused or shutdown unexpectedly.
But computers can behave differently based on user choice. Right? So there could be a user option to cut service beyond a fixed spend. It wouldn't be hard to implement, and tons of people would use it. They don't do it.
It's not a tragic case of priority and limited engineering resources. They like surprise bills, just like hospitals do.
Businesspeople love it when you come to their service and click through their Russian novel of a service agreement that would take a team of lawyers to parse. Once you do that, your money belongs to them! It's their court, their rules! They love it!
> It wouldn't be hard to implement, and tons of people would use it. They don't do it.
Please describe to me, in detail, how this works.
Because every time this comes up everyone claims it's the easiest thing in the world, but if you try and drill into it what they end up actually wanting is generally "pay what you want" cloud services.
There are a _ton_ of resources on AWS that accrue on-going costs with no way to turn them off. A "hard circuit breaker" that brings your newly accruing charges to zero needs to not just shut down your EC2 instances, but delete your EBS volumes, empty your S3 buckets, delete your encryption keys, delete your DNS zones, stop all your DB instances and delete all snapshots and backups, etc, etc.
The only people I see using a feature like this are some individuals doing some basic proof-of-concept work and... a bunch of people that are going to turn it on not understanding the implications and then when they get a burst of traffic that wipes out their AWS account they're going to publish angry blog posts about how AWS killed their startup.
If, like most people, you don't want literally everything to disappear the first time your site gets an unexpected traffic spike, you can already do this by setting up a response tailored to your workload--run a lambda in response to billing alerts that shuts down VMs, or stops your RDS instance but leaves the storage, etc.
> Because every time this comes up everyone claims it's the easiest thing in the world, but if you try and drill into it what they end up actually wanting is generally "pay what you want" cloud services.
Why is it on any (usually a relatively new) user to define how an entire cloud should behave?
Users are asking for a feature that helps them stop accidentally spending more than they intended. This feature request is totally fair. Implementing such a feature would be an act of good faith towards new/onboarding users (also obviously just any user with a very specific budget use-case).
> The only people I see using a feature like this are some individuals doing some basic proof-of-concept work and...
Yes exactly. GCP offers sandboxed accounts for this exact purpose. Why is this such a far reach?
> setting up a response tailored to your workload--run a lambda in response to billing alerts that shuts down VMs, or stops your RDS instance but leaves the storage, etc.
If you're telling every individual user that falls into a specific category to build a specific set of infrastructure, why is it not acceptable to you to just ask AWS to build it?
I think the sandbox idea is a great one. They should just do away with the free tier entirely except for sandbox accounts in which everything just gets shut down the second you go over the free allowance. If you want to build something for real then you pay for whatever resources you use, but if you just want to tinker around and learn a few things then you can get a safe sandbox to do it in.
BUT, I think the parent's point is that such a feature would actually be quite complicated. It's not just a matter of saying "I only want to spend $X in this account per month/total" but defining exactly what you want to do in the case where you hit that limit. Shut everything down? My guess is almost nobody would want to do that. So it ends up being some complicated configuration where you have to deeply understand all of the services and their billing models in order to configure it in the first place. What are the odds that the student who accidentally spins up 100 EC2s for a school project is going to configure this tool correctly?
But I do think the sandbox would be great. Either you are a professional in which case it is your responsibility to manage your system and put in appropriate controls to prevent huge unexpected bills or you are a student (in the general sense of someone learning AWS, not necessarily just someone in school) in which case they provide a safe environment for you to experiment.
> BUT, I think the parent's point is that such a feature would actually be quite complicated.
Sure, but so is making a cloud. Putting the onus of defining a feature like this on users, only after hearing their request ("I want to control my spend"), is IMO unfair.
Not complicated as in "too hard for AWS to build" but complicated as in "really hard to use as someone trying to limit your spend on AWS." So the people most at risk of huge unexpected bills are also not going to be the people knowledgable enough to setup the billing cap correctly. So it would mostly be a feature for enterprises and most enterprises would rather just pay the extra $ rather than potentially turn off a critical system or accidentally delete some user data.
I worked at a company that spent ~$10m per month on AWS. We had a whole "cloud governance" team who built tools to identify both over and underutilized resources. But they STILL never cut any thing off automatically. The risk/reward ratio just wasn't there. You make the right call and shave $10k off a $10m bill every month, but the one time you take down a mission critical service, you give all of that back and then some.
I've been there. I shut down a bunch of what looked like idle instances doing nothing to reduce spend. 80% of which were, in fact, doing nothing. I did drop off two vms that were supporting critical infrastructure.
Everyone who had done any work on them was long gone. I had done my due diligence to identify what they could possibly be.
Still, the day of reckoning came, and we got calls of services down a week after I turned them off. I spun them back up, and they were going again without any real impact to the business.
This turned out to be a blessing as the very next week the cert these same services depended on expired and if I hadn't learned about the system by turning them off we never would have known which boxes held up those services.
Also a lesson in what happens when people leave without any documentation on where the work they did lives and how it works.
> So the people most at risk of huge unexpected bills are also not going to be the people knowledgable enough to setup the billing cap correctly
Yes, which is why AWS builds it.
> . So it would mostly be a feature for enterprises and most enterprises would rather just pay the extra $ rather than potentially turn off a critical system or accidentally delete some user data.
There are plenty of service limits which you need to request to have lifted. This is a common enough use case that there's an entire console section for requesting increases.
SES has a sandbox mode which you need to explicitly disable.
Metering works perfectly fine for the free tier across a gamut of the most popular products.
Beyond this, the platform captures the necessary information in near enough to realtime to provide an accurate picture of spend.
Yet with all of these capabilities, there is no coherent way to constrain identities or accounts to a certain $ spend on even predictable services.
This has been the case for so long that it must clearly be by design. AWS has the guard rails available but only really wants to use them to stop things that would cause them pain (eg SES, bucket limits).
They totally should and could have the ability to manually _reduce_ service limits.
They totally should and could direct users to a way to restrict regions and services users can use without needing to set things up through IAM.
They could and should have service specific guardrails where they make sense - eg for ec2 provide a sandbox where they limit eg instance types, base hourly spend, use of cpu credits. If they wanted to get crazy they could provide per-service actions to take when thresholds are exceeded (though this might be a bit niche).
They could have a _simple_ billing alarm interface (and enable it by default during provisioning) and they could eat the cost of sending smses or emails when those thresholds are met.
Yet they choose to do none of those things, even when they provide defaults for convenience (eg encryption keys)
While my org has a somewhat nuanced understanding of aws and can set things up to provide this certainty both on our/our clients' environments and to our developers, I have a team of people whose literal job it is to do stuff with AWS, we have had training, certs and other access that beginners do not and cannot have.
IMO it doesnt need to be a tap that people can turn off at $500 a month or whatever, it just needs to be a bit more of a "if you select these default options on these basic services, there's no way that you'll acrue 10k in charges overnight"
Yes in some cases the default is quite expensive – the same was there with SQL Azure (though they have changed that recently) and it had created a good amount of bill for us (though for their credit, Azure did refund in all such occasions because we didn't use the capacity at all).
However, I don't know why the alert system doesn't have an option to say "here's my budget, alert me as soon as when my daily pace is set to exceed the monthly budget" instead, you have % of budget amount consumed based alerts, like you can get email if you say 50% of my budget is consumed, which happens every month so kinds of defeat the purpose of an alert.
We ended up creating a simple solution (cloudalarm.in – in beta) that provides such budgeted pace based alert and more ways to get instant alert which isn't possible with usage based alerts.
It's not bad faith. It's 'providing the resources you signed up for'.
Does it mean you have to go into your planning with more consideration as to cost? Yeah.
But how would you feel if your start-up finally goes viral, you're having your best day ever, and then your app just stops working because someone forgot to remove a hard spend limit?
Most people would rather see their app continue running.
And what does turning off the lights look like? If your database hits your cost limit, do you stop serving requests? Delete the data? To what extent do you want 'cost protection' for resources you signed up for?
> If your database hits your cost limit, do you stop serving requests? Delete the data? To what extent do you want 'cost protection' for resources you signed up for?
Sounds like a reasonable configurable option rather than “you shouldn’t be able to choose at all”.
I am sympathetic to the concern about cost overages--I've hit them in AWS before--but given the way that developers and managers think about SaaS products (generally, not just cloud stuff), I tend to think that even if you required them to click three checkboxes and sign their name in blood, the first time you vaporized somebody's production database because they hit their overages and didn't think it would ever happen would be apocalyptic. And the second, and the third. And you're at fault, not the customer, in the public square.
By comparison, chasing off "cost conscious" (read: relentlessly cheap--and I note that in my personal life I'm one of these, no shade being thrown here) users is probably better for them overall.
Work in AWS Premium Support. This is 100% how it goes.
Take KMS keys for example. You can't outright delete a KMS master key; you have to schedule it for deletion. The shortest period you can schedule for deletion is 7 days (default 30). Once the key is deleted, all encrypted data is orphaned.
I used to run an AWS consultancy, which is how I know. ;) More than once I had a customer go "well support won't help me, how can I get my data back?". And I had to tell them "well, support isn't just not helping you for kicks, you know?".
I am sorry, I might be missing something, but I call bullshit. How much does it cost for Amazon to store several bytes that make a key? 5 cents per decade?
“Yeah, so uhm, you hit zero, so we deleted all your keys in an irrecoverable way, sorry not sorry” — is not a circuit breaker. Make all services inaccessible to public and store the data safely until customer tops up their balance. That’s how VPSes have worked forever.
I don’t argue that “cheapo” clients are worth retaining for AWS, clearly they are not. But this kind of hypocrisy really triggers me.
Edit: a helpful person below suggested I misunderstood the parent, and I now I think I did.
AWS doesn't retain anything for you unless you tell them to, and when you tell them to delete something (as in the example relayed by the person you are replying to), they delete it as best as they are able. That's part of the value proposition: when you delete the thing, it goes away. Why would they start now for clients who want their bills to be in the tens of dollars (when if you really care you can do it yourself off of billing alerts[0])?
Going to be real: you aren't "triggered", which is actually a real thing out there that you demean with this usage of the term. You're just not the target market and you're salty that it's more complex than you think it is.
Stop serving requests until the finances are rectified, delete the data 30 days after it stops. Final migration out/egress requires a small balance for that purpose.
The engineers designing and building these systems are some of the best in the world, this is relatively trivial.
I agree, my term for this is “bad faith”.
I recently had a free $200 credit for Azure. I setup their default MariaDB instance for a side project, figuring I’d get my feet wet with Azure. I didn’t spend time evaluating the cost bc I figured, how much could the default be if I haven’t cranked up the instance resources at all? Turns out the answer is more than $10/day which I discovered when authentication failed to my test DB. Back to Digital Ocean.