Is it really that easy? What are the edge cases?

levkk · 2025-03-14T18:19:17 1741976357

It's not. We tried. Plus, it doesn't work on RDS, where most of production databases are. I think Citus was a great first step in the right direction, but it's time to scale the 99% of databases that don't run on Azure Citus already.

mindcrash · 2025-03-14T18:33:28 1741977208

That's because Amazon wants to do whatever they like themselves... you apparently can get stuff to work by running your own masters (w/ citus extension) in EC2 backed by workers (Postgres RDS) in RDS:

https://www.citusdata.com/blog/2015/07/15/scaling-postgres-r... (note that this is a old blog post -- pg_shard has been succeeded by citus, but the architecture diagram still applies)

And me saying "Apparently" because I have no experience dealing with large databases on AWS.

Personally had no issues with Citus too, both on bare metal/VMs and as SaaS on Azure...

caffeinated_me · 2025-03-14T18:33:03 1741977183

Depends on your schema, really. The hard part is choosing a distribution key to use for sharding- if you've got something like tenant ID that's in most of your queries and big tables, it's pretty easy, but can be a pain otherwise.

mindcrash · 2025-03-14T18:51:12 1741978272

Same pain as with good old (native) partitioning, right? :)

As with partitioning, in my experience something like a common key (identifying data sets), tenant id and/or partial date (yyyy-mm) work pretty great

caffeinated_me · 2025-03-14T20:58:31 1741985911

For a multi-tenant use case, yeah, pretty close to thinking about partitioning.

For other use cases, there can be big gains from cross-shard queries that you can't really match with partitioning, but that's super use case dependent and not a guaranteed result.