We're at early stages of planning an architecture where we offload pre-rendered ...

redwood · on Jan 20, 2022

Your example really summarizes the challenge with the AWS paradigm: namely that they want you to believe that the thing to do is to spread the the backend of your application across a large number of distinct data systems. No one uses DynamoDB alone: they bolt it onto Postgres after realizing they have availability or scale needs beyond what a relational database can do, then they bolt on Elasticsearch to enable querying, and then they bolt on Redis to make the disjointed backend feel fast. And I'm just talking operational use cases; ignoring analytics here. Honestly it doesn't need to be these particular technologies but this is the general phenomenon you see in so many companies that adopt a relational database, key/value store (could be Cassandra instead of DynamoDB eg like what Netflix does), a search engine, and a caching layer because they think that that's the only option

This inherently leads to a complexity debt explosion, fragmentation in the experience, and an operationally brittle posture that becomes very difficult to dig out of (this is probably why AWS loves the paradigm).

awsthro00945 · on Jan 20, 2022

>No one uses DynamoDB alone

Almost every single team at Amazon that I can think of off the top of my head uses DynamoDB (or DDB + S3) as its sole data store. I know that there are teams out there using relational DBs as well (especially in analytics), but in my day-to-day working with a constantly changing variety of teams that run customer-facing apps, I haven't seen RDS/Redis/etc being used in months.

goostavos · on Jan 20, 2022

The thing about Amazon is that it is massive. In my neck of the woods, I've got the complete opposite experience. So many teams have the exact DDB induced infrastructure sprawl as described by the GP (e.g. supplemental RDBMS, Elastic, caching layers, etc..).

Which says nothing of DDB. It's an god-tier tool if what you need matches what it's selling. However, I see too many teams reach for it by default without doing any actual analysis (including young me!), thus leading to the "oh shit, how will we...?" soup of ad-hoc supporting infra. Big machines look great on the promo-doc tho. So, I don't expect it to stop.

sebastialonso · on Jan 20, 2022

> they bolt it onto Postgres after realizing they have availability or scale needs beyond what a relational database can do, then they bolt on Elasticsearch to enable querying, and then they bolt on Redis to make the disjointed backend feel fast.

This made my head explode. Why would you explicitly join two systems made to solve different issues together? This sounds rather like a lack of architectural vision. Postgres's zero access-design inherently clashes with DynamoDB's; same goes with ElasticSearch scenario: DynamoDB's was not made to query everything, it's made to query specifically what you designed to be queried and nothing else. Redis sort-of make sense to gain a bit of speed for some particular access, but you still lack collection level querying with it.

In my experience, leave DynamoDB alone and it will work great. Automatic scaling is cheaper eventually if you've done your homework about knowing your traffic.

300bps · on Jan 20, 2022

In my experience, leave DynamoDB alone and it will work great.

My experience agrees with yours and I'm likewise puzzled by the grandparent comment. But just a shout out to DAX (DyanmoDB Accelerator) which makes it scale through the roof:

https://aws.amazon.com/dynamodb/dax/

jamesblonde · on Jan 20, 2022

If you add DAX you are not guaranteed to read your writes. Terrible consistency model. https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

300bps · on Jan 20, 2022

Terrible consistency model.

Judging a consistency model as "terrible" implies that it does not fit any use case and therefore is objectively bad.

On the contrary, there are plenty of use cases where "eventually consistent writes" is the perfect use case. To judge this as true, you only have to look and see that every major database server offers this as an option - just one example:

https://www.compose.com/articles/postgresql-and-per-connecti...

rmbyrro · on Jan 20, 2022

I think main advantage of DDB is being serverless. Adding a server-based layer on top of it doesn't make sense to me.

I have a theory it would be better to have multiple table-replicas for read access. At application level, you randomize access to those tables according to your read scale needs.

Use main table streams and lambda to keep replicas in sync.

Depending on your traffic, this might end more expensive than DAX, but you remain fully serverless, using the exact same technology model, and have control over the consistency model.

Haven't had the chance to test this in practice, though.

jamesfinlayson · on Jan 21, 2022

Thanks - I've seen DAX mentioned and possibly even recommended. I don't need faster DynamoDB that much.

tmitchel2 · on Jan 20, 2022

You choose your consistency on reads. However, Dax won't help you much on a write heavy workload.

SomeCallMeTim · on Jan 20, 2022

In my experience, NoSQL is almost never the right answer.

And DynamoDB is worse than most.

My prediction is that the future is in scalable SQL; CockroachDB or Yugabase or similar.

NoSQL actually causes more problems than it solves, in my experience.

deanCommie · on Jan 21, 2022

There are plenty of reasons when NoSQL is the right answer. The biggest is when you care more about predictable performance: https://brooker.co.za/blog/2022/01/19/predictability.html?s=...

SomeCallMeTim · on Feb 1, 2022

As long as you consider "can just fail if it gets too busy" to be "predictable."

Which I don't. I'd rather see reliable operation than "predictable except for when it fails outright" in almost every situation.

If you've encountered that other situation, where failures are fine? Then great. But I still assert that's a tiny minority of real-life DB use cases.

nine_k · on Jan 20, 2022

If this is not the only option, what would you suggest instead? How to simplify it?

urthor · on Jan 20, 2022

The alternative is to go to GCP and use the big GCP selling point, which is Big Table/Big Query.

Those databases build most of that in, and it's all one fairly excellent distributed monolith.

onlyrealcuzzo · on Jan 20, 2022

Wouldn't Spanner be closer to what you're talking about?

Keyframe · on Jan 20, 2022

It's still a marriage.

ndm000 · on Jan 20, 2022

> they bolt it onto Postgres

I am working with a company that is redesigning an enterprise transactional system, currently backed by an Oracle database with 3000 tables. It’s B2B so loads are predictable and are expected to grow no more than 10% per year.

They want to use DynamoDB as their primary data store, with Postgres for edge cases it seems to me the opposite would be more beneficial.

At what point does DynamoDB become a better choice than Postgres? I know that at certain scales Postgres breaks down, but what are those thresholds?

picardo · on Jan 20, 2022

You can make Postgres scale, but there is an operational cost to it. DynamoDB does that for you out of the box. (So does Aurora, to be honest, but there is also an overhead to setting up an Aurora cluster to the needs of your business.)

I've found also that in Postgres the query performance does not keep up with bursts of traffic -- you need to overprovision your db servers to cope with the highest traffic days. DynamoDB, in contrast, scales instantly. (It's a bit more complicated that that, but the effect of it is nearly instantaneous.) And what's really great about DynamoDB is after the traffic levels go down, it does not scale down your table and maintains it at the same capacity at no additional cost to you, so if you receive a burst of traffic at the same throughput, you can handle it even faster.

DynamoDB does a lot of magic under the hood, as well. My favorite is auto-sharding, i.e. it automatically moves your hot keys around so the demand is evenly distributed across your table.

So DynamoDB is pretty great. But to get the the best experience from DynamoDB, you need to have a stable codebase, and design your tables around your access patterns. Because joining two tables isn't fun.

eropple · on Jan 20, 2022

> So DynamoDB is pretty great. But to get the the best experience from DynamoDB, you need to have a stable codebase, and design your tables around your access patterns. Because joining two tables isn't fun.

More than just joining--you're in the unenviable place of reinventing (in most environments, anyway) a lot of what are just online problems in the SQL universe. Stuff you'd do with a case statement in Postgres becomes some on-the-worker shenanigans, stuff you'd do with a materialized view in Postgres becomes a batch process that itself has to be babysat and managed and introduces new and exciting flavors of contention.

There are really good reasons to use DynamoDB out there, but there are also an absolute ton of land mines. If your data model isn't trivial, DynamoDB's best use case is in making faster subsets of your data model that you can make trivial.

rmbyrro · on Jan 20, 2022

Using +1 DynamoDB table is a bad idea in the first place.

vosper · on Jan 20, 2022

They should be looking at Aurora, not Dynamo. Using Dynamo as the primary store for relational data (3000 tables!) sounds like an awful idea to me. I’d rather stay on Oracle.

https://aws.amazon.com/rds/aurora/?aurora-whats-new.sort-by=...

rmbyrro · on Jan 20, 2022

It really depends much more on the access patterns than data shape.

Certain access patterns can do pretty well with 3,000 relational tables denormalized to a single DynamoDB table.

jerf · on Jan 20, 2022

It seems to me that what this is saying is that storage has become so cheap that if another database provides even slight advantages over another for some workload it is likely to be deployed and have all the data copied over to it.

HN entrepreneurs take note, this also suggests to me that there may be a market for a database (or a "metadatabase") that takes care of this for you. I'd love to be able to have a "relational database" that is also some "NoSQL" databases (since there's a few major useful paradigms there) that just takes care of this for me. I imagine I'd have to declare my schemas, but I'd love it if that's all I had to do and then the DB handled keeping sync and such. Bonus points if you can give me cross-paradigm transactionality, especially in terms of coherent insert sets (so "today's load of data" appears in one lump instantly from clients point of view and they don't see the load in progress).

At least at first, this wouldn't have to be best-of-breed necessarily at anything. I'd need good SQL joining support, but I think I wouldn't need every last feature Postgres has ever had out of the box.

If such a product exists, I'm all ears. Though I am thinking of this as a unified database, not a collection of databases and products that merely manages data migrations and such. I'm looking to run "CREATE CASSANDRA-LIKE VIEW gotta_go_fast ON SELECT a.x, a.y, b.z FROM ...", maybe it takes some time of course but that's all I really have to do to keep things in sync. (Barring resource overconsumption.)

jgraettinger1 · on Jan 20, 2022

> I'd love to be able to have a "relational database" that is also some "NoSQL" databases (since there's a few major useful paradigms there) that just takes care of this for me. I imagine I'd have to declare my schemas, but I'd love it if that's all I had to do and then the DB handled keeping sync and such.

You might be interested in what we're building [0]

It synchronizes your data systems so that, for example, you can CDC tables from your Postgres DB, transform them in interesting ways, and then materialize the result in a view within Elastic or DynamoDB that updates continuously and with millisecond latency.

It will even propagate your sourced SQL schemas into JSON schemas, and from there to, say, equivalent Elastic Search schema.

[0]: https://github.com/estuary/flow

rmbyrro · on Jan 20, 2022

I'm afraid it's not feasible to develop a single general purpose implementation for that.

The amount of complexity to guarantee data integrity while covering all possible use cases will be just unmanageable.

I'd be extremely happy to be proven wrong, though...

grncdr · on Jan 20, 2022

I think there was a project like this a few years ago (wrapping a relational DB + ElasticSearch into one box) and I thought it was CrateDB, but from looking at their current website I think I'm misremembering.

The concept didn't appeal to me very much then, so I never looked into it further.

---

To address your larger point, I think Postgres has a better chance of absorbing other datastores (via FDW and/or custom index types) and updating them in sync with it's own transactions (as far as those databases support some sort of atomic swap operation) than a new contender has of getting near Postgres' level of reliability and feature richness.

neuronexmachina · on Jan 20, 2022

Were you thinking of ZomboDB? https://github.com/zombodb/zombodb

mwarkentin · on Jan 20, 2022

AWS tried building this with Glue Elastic Views: https://aws.amazon.com/glue/features/elastic-views/

It's been in preview forever though, not sure when it's going to officially launch.

WatchDog · on Jan 21, 2022

My understanding of the cockroach db architecture, it that it’s essentially two discrete components, a key value store that actually persists the data, and a SQL layer built on top. Although I don’t think it’s recommended or supported to access the key value store directly.

andy_ppp · on Jan 20, 2022

Postgres with Cassandra built in and scaled separately would be really great.

tokamak-teapot · on Jan 20, 2022

We use DynamoDB alone. Microservices generally use one or two tables each.

augustl · on Jan 20, 2022

I have no direct experience with scaling DynamoDB in production, so take this with a grain of salt. But it seems to me that the on-demand scaling mode in DynamoDB has gotten _really_ good the last couple of years.

For example, you used to have to manually set RCU/WCU to a high number when you expected a spike in traffic, since the ramp-up for on-demand scaling was pretty slow (could take up to 30 minutes). But these days, on-demand can handle spikes from 10s of requests a minute to 100s/1000s per second gracefully.

The downside of on-demand is the pricing - it's more expensive if you have continuous load. But it can easily become _much_ cheaper if you have naturally spiky load patterns.

Example: https://aws.amazon.com/blogs/database/running-spiky-workload...

moduspol · on Jan 20, 2022

> The downside of on-demand is the pricing - it's more expensive if you have continuous load.

True, although you don't have to make that choice permanently. You can switch from provisioned to on demand once every 24 hours.

And you can also set up application autoscaling in provisioned mode, which'll allow you to set parameters under which it'll scale your provisioned capacity up or down for you. This doesn't require any code and works pretty well if you can accept autoscaling adjustments being made in the timeframe of a minute or two.

PaywallBuster · on Jan 20, 2022

scaling down is limited to 4x a day

chrisoverzero · on Jan 20, 2022

It’s up to 27 times a day, if you time it well: “4 decreases in the first hour, and 1 decrease for each of the subsequent 1-hour windows in a day”.

PaywallBuster · on Jan 20, 2022

gotcha, it's been awhile since I was looking at that

tmitchel2 · on Jan 20, 2022

They upped it when they're own autoscaler needed the ability to back it down more :-/

PaywallBuster · on Jan 20, 2022

Indeed

We've some regular jobs that require scaling up dynamodb in advance few times per day, but then dynamo is only able to scale down 4x per day, so we're probably paying for over capacity unnecessarily (10x or more) for a couple hours a day

Now we just moved ondemand and let them handle it, works fine

jedberg · on Jan 20, 2022

> Is the dream of a self-managing, fire-and-forget key value database completely naive?

It's not, if you plan it right. Learn about single table design for DynamoDB before you start. There are a lot of good resources from Amazon and the community.

Here is a very accessible video from the community:

https://www.youtube.com/watch?v=BnDKD_Zv0og

Here is a video from Rick Houlihan, a senior leader from AWS who basically helps companies convert to single table design:

https://www.youtube.com/watch?v=KYy8X8t4MB8

And a good book on the topic:

https://www.dynamodbbook.com

If you use single table design, you can turn on all of the auto-tuning features of DynamoDB and they will work as expected and get better and more efficient with more data.

Some people worry that this breaks the cardinal rule of microservices: One database per service. But the actual rule is never have one service directly access the data of another, always use the API. So as long as your services use different keyspaces and never access each other's data, it can still work (but does require extra discipline).

Tehnix · on Jan 20, 2022

A lot of things that used to be a concern (hot partitions, etc) are not a concern anymore and most have been solved these days :)

Put it on on-demand pricing (it'll be better and cheaper for you most likely), and it will handle any load you throw at it. Can you get it to throttle? Sure, if you absolutely blast it without ever having had that high of a need before (and it can actually be avoided[0]).

You will need to understand how to model things for the NoSQL paradigm that DynamoDB uses, but that's a question of familiarity and not much else (you didn't magically know SQL either).

My experience comes from scaling DynamoDB in production for several years, handling both massive IoT data ingestion in it as well as the user data as well. We were able to replace all things we thought we would need a relational database for, completely.

My comparison between a traditional RDS setup: - DynamoDB issues? 0. Seriously. Only thing you need to monitor is billing. - RDS? Oh boy, need to provision for peak capacity, need to monitor replica lags, need to monitor the Replicas themselves, constant monitoring and scaling of IOPS, suddenly queries get slow as data increases, worrying about indexes and the data size, and much more...

[0]: https://theburningmonk.com/2019/03/understanding-the-scaling...

ignoramous · on Jan 20, 2022

> We're at early stages of planning an architecture where we offload pre-rendered JSON views of PostgreSQL onto a key value store optimised for read only high volume.

If possible, put the json in Workers KV, and access it through Cloudflare Workers. You can also optionally cache reads from Workers KV into Cloudflare's zonal caches.

> To be honest, I'd hoped that it could be a bit more 'magic', like S3

You could opt to use the slightly more expensive DynamoDB On-Demand, or the free DynamoDB Auto-Scaling modes, which are relatively no-config. For a very ready-heavy workload, you'd probably want to add DynamoDB Accelerator (an write-through in-memory cache) in front of your tables. Or, use S3 itself (but a S3 bucket doesn't really like when you load it with a tonne of small files) accelerated by CloudFront (which is what AWS Hyperplane, tech underpinning ALB and NLB, does: https://aws.amazon.com/builders-library/reliability-and-cons...)

S3, much like DynamoDB, is a KV store: https://news.ycombinator.com/item?id=11161667 and https://www.allthingsdistributed.com/2009/03/keeping_your_da...

Marazan · on Jan 20, 2022

DyanmoDB is pretty much the opposite of magic.

It is a resource that can often be the right tool for the job but you really have to understand what the job is and carefully measure Dynamo up for what you are doing.

It is _easy_ to misunderstand or miss something that would make Dynamo hideously expensive for your use case.

uberdru · on Jan 20, 2022

What use cases would likely make it hideously expensive, in your view? Like, what are the red flags?

rmbyrro · on Jan 20, 2022

Hot keys are much lesser of an issue nowadays. It'd been a big one in old DDB architectures.

I'd say requiring scans or filters as opposed to queries is one of the biggest issues that can bite your pocket.

Think carefully about how you'll access your data later. You won't be able to change it drastically and cheaply later.

Marazan · on Jan 20, 2022

Hot keys are the primary one. They destroys your "average" calculations for your throughput.

Bulk loading data is the other gotcha I've run into. Had a beautiful use case for steady read performance of a batch dataset that was incredibly economical on Dynamo but the cost/time for loading the dataset into Dynamo was totally prohibitive.

Basically Dynamo is great for constant read/write of very small, randomly distributed documents. Once you are out of thay zone things can hey dicey fast.

jugg1es · on Jan 20, 2022

I do not recommend starting off with a decision to use DynamoDB before you have worked with it directly for some time to understand it. You could spend months trying to shoehorn your use case into it before realizing you made a mistake. That said, DynamoDB can be incredibly powerful and inexpensive tool if used right.

rmbyrro · on Jan 20, 2022

I think this can be said about any technology, really...

jugg1es · on Jan 20, 2022

Yea, probably, but it is especially true for DynamoDB because it can initially appear as though your use cases are all supported but that is only because you haven't internalized how it works yet. By the time you realize you made a mistake, you are way too far in the weeds and have to start over from scratch. I would venture that more than 50% of DynamoDB users have had this happen to them early on. Anecdotally, just look at the comments on this post. There are so many horror stories with DynamoDB, but they're basically all people who decided to use it before they really understood it.

eknkc · on Jan 20, 2022

I believe it used to be static provisioning, you'd set the read and limit capacity beforehand. Then obviously there is autoscaling of those but it is still steps of capacity being provisioned.

They now have a dynamic provisioning scheme, you simply don't care but it is more expensive so if you have predictible requirements it is still better to use static capacity provisioning. There is an option though.

DynamoDB also requires the developer to know about its data storage model. While this is generally a good practice for any data storage solution, I feel like Dynamo requires a lot more careful planning.

I also think that most of the best practices, articles etc apply to giant datasets with huge scale issues etc. If you are running a moderately active app, you probably can get away with a lot of stupid design decisions.

paulgb · on Jan 20, 2022

My experience with dynamic provisioning has been that it is pretty inelastic, at least at the lower range of capacity. E.g. if you have a few read units and then try to export the data using AWS's cli client, you can pretty quickly hit the capacity limit and have to start the export over again. Last time, I ended up manually bumping the capacity way up, waiting a few minutes for the new capacity to kick in, and then exporting. Not what I had in mind when I wanted a serverless database!

moduspol · on Jan 20, 2022

I understand it's not really your point, but if you're actually looking to export all the data from the table, they've got an API call you can give to have DynamoDB write the whole table to S3. This doesn't use any of your available capacity.

https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

Beyond that, though, it's really not designed for that kind of use case.

paulgb · on Jan 20, 2022

Ah, fair point. Somehow I didn't encounter that when I was trying to export, even though it existed at the time. But it would have solved my problem.

amzn-throw · on Jan 20, 2022

The key benefit with DDB is predictability: https://brooker.co.za/blog/2022/01/19/predictability.html

Yes, you have to learn about all these things upfront. But once you figure it out, test it, and configure it - it will work as you expect. No surprises.

Whereas Relational Databases work until they don't. A developer makes a tiny (even a no-op) change to a query or stored procedure, a different SQL plan gets chosen, and suddenly your performance/latency dramatically reduces, and you have no easy way to roll it back through source control/deployment pipelines. You have to page a DBA who has to go pull up the hood.

With services like DDB, you maintain control.

jpgvm · on Jan 20, 2022

It is for now but it doesn't have to be. Dynamo's design isn't particularly amenable to dynamic and heterogenous shard topologies however.

There could exist a fantasy database where you still tell it your hash and range keys, which are roughly how you tell the database which data isn't closely related to each other and which data is (and which you may want to scan) but instead of hard provisioning shard capacity it automagically splits shards when they hotspot and doesn't rely consistent hashing so that every shard can be sized differently depending on how hot it is.

Right now such a database doesn't exist AFAICT as most places that need something the scales big enough also generally have the skill to avoid most of the pitfalls that cause problems on simple databases like Dynamo.

whalesalad · on Jan 20, 2022

Dynamo is incredibly hard to use correctly

I’d urge you to start writing a prototype, a lot of your assumptions might get thrown out the window. Dynamo is not necessarily good for reading high volume. You’ll end up needing to use a parallel scan approach which is not fast.

qvrjuec · on Jan 20, 2022

I'd say Dynamo is extremely good at reading high volume, with the appropriate access pattern. It's very efficient at retrieving huge amounts of well partitioned data using the data's keys, but scanning isn't so efficient.

whalesalad · on Jan 21, 2022

You can only ever fetch 1MB of data at a time though, even when using the more efficient query method (as opposed to scan). If your individual entities are not very tiny, it is hard to get for instance 2M items back in a reasonable amount of time.

mythrwy · on Jan 20, 2022

Also can be _very_ expensive if you do not use it correctly.

phamilton · on Jan 20, 2022

I don't know your scaling needs, but I would highly recommend just using Aurora postgresql for read-only workloads. We have some workloads that are essentially K/V store lookups that were previously slated for dynamodb. On an Aurora cluster of 3*r6g.xlarge we easily handle 25k qps with p99 in the single-digit ms range. Aurora can scale up to 15 instances and up to 24xlarge, so it would not be unreasonable to see 100x the read workload with similar latencies.

Happy to talk more. We're actively moving a bunch of workloads away from DynamoDB and to Aurora so this is fresh on our minds.

afandian · on Jan 20, 2022

Thanks, that's what I hope will work. I might drop you a mail at some point.

AtlasBarfed · on Jan 20, 2022

The salespeople always promise magic and handwave CAP away.

But data at scale is about:

1) knowing your queries ahead of time (since you've presumably reached the limit of PG/maybesql/o-rackle.

2) dealing with CAP at the application level: distributed transactions, eventual consistency, network partitions.

3) dealing with a lot more operational complexity, not less.

So if the snake oil salesmen say it will be seamless, they are very very very much lying. Either that, or you are paying a LOT of money for other people to do the hard work.

Which is what happens with managing your own NoSQL vs DynamoDB. You'll pay through the roof for DynamoDB at true big data scales.

restlake · on Jan 20, 2022

If you know and understand S3 pretty well, and you purely need to generate, store, and read materialized static views, I highly recommend S3 for this use case. I say this as someone who really likes working with DDB daily and understands the tradeoffs with Dynamo. You can always layer on Athena or (simpler) S3 Select later if a SQL query model is a better fit than KV object lookups. S3 is loosely the fire and forget KV DB you’re describing IMO depending on your use case

manigandham · on Jan 20, 2022

Plenty of options already exist. DynamoDB has both autoscaling and serverless modes. AWS also has managed Cassandra (runs on top of DynamoDB) which doesn't need instance management.

Azure has CosmosDB, GCP has Cloud Datastore/Firestore, and there are many DB vendors like Planetscale (mysql), CockroachDB (postgres), FaunaDB (custom document/relational) that have "serverless" options.

lkrubner · on Jan 20, 2022

Exactly. This has been my experience with several AWS technologies. Like with their ElasticSearch service, where I had to constantly fine-tune various parameters, such as memory. I was curious why they couldn't auto-scale the memory, why I had to do that manually. There are several AWS services that should be a bit more magical, but they are not.

emodendroket · on Jan 21, 2022

Dynamo is like the opposite of fire and forget it. You really want to know your access patterns at design time.

gonzo41 · on Jan 20, 2022

There's not really magic with s3, you still need to name things with coherrent prefixes to spread around the load.

DynamoDB is almost simple enough to learn in a day. And if you're doing nothing with it, you're only really paying for storage. Good luck with your decisions.

jopsen · on Jan 20, 2022

And S3 won't scale instantly... If your load is big enough :)

Everything has limits, but S3 is remarkably hard to break if used right.

ralusek · on Jan 20, 2022

S3 naming no longer matters for performance. Rejoice.

PaywallBuster · on Jan 20, 2022

Prefixes are not needed 90% of use cases

brodouevencode · on Jan 20, 2022

I'm not going to speculate on the accuracy of 90% value, but I will say that appropriately prefixed objects substantially help with performance when you have tons of small-ish files. Maybe most orgs don't have that need but in operational realms doing this with your logs make the response faster.

_pdp_ · on Jan 20, 2022

DynamoDB is like S3 but with query features. It is not a relational db. It is a document storage. So you need to use it for what it is.

Our entire solution is basically based on top of lambda and dynamodb tables and it works really as long as you don't threat the tables like SQL.

k__ · on Jan 20, 2022

After looking into solutions like Fauna, Upstash, and Planetscale I don't understand why anyone is bothering with DDB anymore.

I read "the dynamodb book" and almost got a stroke. So much idiosyncrasies, for what?!

zurn · on Jan 20, 2022

Your impressions are cordect: DynamoDB is quite low-level and more like a DB kit than ready to use DB, for most applications it's better to use something else.

qaq · on Jan 20, 2022

Save yourself a ton of pain and don't use DynamoDB

nesarkvechnep · on Jan 20, 2022

If it’s possible in your situation, instead of vendor lock-in, invest in cacheability of your service and leverage HTTP cache as much as possible.

giaour · on Jan 20, 2022

If you use the "pay per request" billing model instead of provisioned throughput, DynamoDB scaling is self-managing, and you can treat your DB as a fire-and-forget key/value store. You need to plan how you'll query your data and structure the keys accordingly, but honestly, that applies even more to S3 than it does to Dynamo.

aneil · on Jan 20, 2022

Exactly my experience. I got sucked into using more than once, thinking it would be better next time, but there are just so many sharp edges.

At one company, someone accidentally set the write rate rate high to transfer data into the db. This had the effect of permanently increasing the shard count to a huge number, basically making the DB useless.

twodayrice · on Jan 20, 2022

I think this is a good summary, and it even gets more complicated if you start using the DAX cache. Your read/write provisioning for DAX is totally different than the underlying dynamodb tables. The write throughput for Dax is limited by the size of the master node in the cluster. Can you say bottleneck?

stickfigure · on Jan 20, 2022

Take a look at Firestore / Google Cloud Datastore. It's pretty much exactly what you describe - fire and forget. There's no concept of "node" (at least not from the outside).

snorkel · on Jan 20, 2022

If you don't need data persistence then consider redis instead (which can also do persistence if you enable AOF)

pbalau · on Jan 20, 2022

Thinking like this both baffles me, but also makes me happy because there will always be a need for people like me, infra. AWS is not a magical tool that will replace your infra team, it is a magical tool that will allow your infra team to do more. I am the infra team of my startup and I estimate that only 50% of my time is doing infra work. The rest is supporting my peers, work in frameworky stuff, solve dev efficiency issues bla bla.

Lets say that you operate in an AWS-less environment, with everything bare metal, in a datacenter. Your GOOD infra team has to do the following:

Hardware:

- make sure there is a channel to get new hardware, both for capacity increase and spares. What are you going to do? Buy 1 server and 2 spares? If one of the servers has an issue, isn't it quite likely that the other servers, from the same batch, to have the same issue? Is this affecting you, or not? Where do you store the spares? In a warehouse somewhere, making it harder to deploy? In the rack with the one in use, wasting rackspace/switch space? Are you going to rely on the datacenter to provide you with the hardware? What if you are one of their smaller customers and your requests get pushed back because some larger customer requests get higher priority?

- make sure there is a way to deploy said hardware. You don't want to not be able to deploy a new server because there is no space in the rack, or no space in the switch. Where are your spares? In a warehouse miles away from the datacenter? Do you have access to said warehouse at midnight, on Thanksgiving? Oh shit, someone lost the key to your rack! Oh noes, we don't have any spare network cable/connectors/screws...

Software:

- did you patch your servers? did you patch your switches?

- new server, we need to install the os. And a base set of software, including the agent we use to remote manage the server.

- oh, we also need to run and maintain the management infra, say the control plane for k8.

- oh, we want some read replicas for this db, not only we need the hardware to run the replicas on (and see above for what that means), now you need to add a bunch of monitoring and have plans in place to handle things like: replicas lagging, network links between master and replicas being full, failover for the above, master crapping out yada yada.

I bet there are many other aspects I'm missing.

Choices:

Your GOOD infra team will have to decide things like: how many spares do we need, is the capacity we have atm enough for the launch of our next world-changing feature that half the internet wants to use? Are we lucky enough to survive a few months without spares or should we get estra capacity in another datacenter? Do we want to have replicas on the west coast or is the latency acceptable?

These are the main areas of what an infra team is supposed to do: Hardware, Software and Choices. AWS (and most other cloud providers) is making the first 2 points non issues. For the last area you can do 2 things: get an infra team (could be a full fledged team, could be 1 person, you could do it) and teoretically you will get choices tailored to what your business needs OR let AWS do it for you. *AWS might make these choices based on a metric you disagree with and this is the main reason people complain*.