Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it really that easy? What are the edge cases?


It's not. We tried. Plus, it doesn't work on RDS, where most of production databases are. I think Citus was a great first step in the right direction, but it's time to scale the 99% of databases that don't run on Azure Citus already.


That's because Amazon wants to do whatever they like themselves... you apparently can get stuff to work by running your own masters (w/ citus extension) in EC2 backed by workers (Postgres RDS) in RDS:

https://www.citusdata.com/blog/2015/07/15/scaling-postgres-r... (note that this is a old blog post -- pg_shard has been succeeded by citus, but the architecture diagram still applies)

And me saying "Apparently" because I have no experience dealing with large databases on AWS.

Personally had no issues with Citus too, both on bare metal/VMs and as SaaS on Azure...


Depends on your schema, really. The hard part is choosing a distribution key to use for sharding- if you've got something like tenant ID that's in most of your queries and big tables, it's pretty easy, but can be a pain otherwise.


Same pain as with good old (native) partitioning, right? :)

As with partitioning, in my experience something like a common key (identifying data sets), tenant id and/or partial date (yyyy-mm) work pretty great


For a multi-tenant use case, yeah, pretty close to thinking about partitioning.

For other use cases, there can be big gains from cross-shard queries that you can't really match with partitioning, but that's super use case dependent and not a guaranteed result.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: