Sorry for not answering everyone individually, but I see some confusion duo to the lack of context about what we do as a company.
First things first, Nhost falls into the category of backend-as-a-service. We provision and operate infrastructure at scale, and we also provide and run the necessary services for features such as user authentication and file storage, for users creating applications and businesses. A project/backend is comprised of a Postgres Database and the aforementioned services, none of it is shared. You get your own GraphQL engine, your own auth service, etc. We also provide the means to interface with the backend through our official SDKs.
Some points I see mentioned below that are worth exploring:
- One RDS instance per tenant is prohibited from a cost perspective, obviously. RDS is expensive and we have a very generous free tier.
- We run the infrastructure for thousands of projects/backends which we have absolutely no control over what they are used for. Users might be building a simple job board, or the next Facebook (please don't). This means we have no idea what the workloads and access patterns will look like.
- RDS is mature and a great product, AWS is a billion dolar company, etc - that is all true. But is it also true that we do not control if a user's project is missing an index and the fact that RDS does not provide any means to limit CPU/memory usage per database/tenant.
- We had a couple of discussions with folks at AWS and for the reasons already mentioned, there was no obvious solution to our problem. Let me reiterate this, the folks that own the service didn't have a solution to our problem given our constraints.
- Yes, this is a DIY scenario, but this is part of our core business.
I hope this clarifies some of the doubts. And I expect to have a more detailed and technical blog post about our experience soon.
By the way, we are hiring. If you think what we're doing is interesting and you have experience operating Postgres at scale, please write me an email at nuno@nhost.io. And don't forget to star us at https://github.com/nhost/nhost.
Indeed RDS was never designed to be "re-sold", and assuming that a single PG instance will handle lots of different users is naive. Turns out if you're aiming to be an infra provider, building your own infra is the way to go. Who would have thought?
If I was launching a BaaS I wouldn't touch AWS. Grab a few Hetzner bare metal servers and setup your infra. You're leaving a massive profit margin to AWS when you don't have to.
Also would like to know this. This post is a bit light on content. It sounds like they just moved to K8s from RDS. In my experience, Postgres works decently but there are sharp edges running it containerized (OOMS in subprocesses might not be caught by the container runtime, shared memory is pitifully low in docker at 64 MB by default)
From other comments, it looks like they rolled their own solution. Perhaps they had unique requirements, but it seems short-sighted to forego the automation an operator brings.
And what are your cost savings from RDS perspective. I'd a similar problem where we'd to provision like 5 databases for 5 different teams. RDS is really expensive. And your solution is open source ? I would like to try.
I hope to have a more detailed analysis to share when we have more accurate data. We launched individual instances recently and although I don't have exact numbers, the price difference will be significant. Just imagine how much it would cost to have 1 RDS instance per tenant (we have thousands).
We haven't open-sourced any of this work yet but we hope to do it soon. Join us on discord if you want to follow along (https://nhost.io/discord).
I'm guessing that they're betting that they can put X idle customers on one machine, and so pay X/machine cost for their free tier.
A while ago, I worked for a company that offered a hosted version of their application that required Postgres, etcd, Kubernetes, etc. It was set up so that every customer got their own GCP project, containing a K8s cluster, Cloud Storage, and a Postgres instance, The k8s cluster ("workspace") then contained dedicated nodes (4vCPU x 16G RAM at a minimum, autoscaling up according to their workload including GPU compute), SSDs, a public-facing LoadBalancer, etc. This is good for per-customer isolation, but quite costly at idle, on the order of several hundred dollars a month. Users expect this kind of isolation (but need the SOC2 and similar checkmark for sure), but they don't expect to be charged when they're not running anything, which was a problem for us.
If I was doing this again, I would do it this way, at least for the MVP. One option is to make the application multi-tenant aware, and isolate at the application level instead of at the GCP project level. This might be more difficult to get certified and might not meet everyone's HIPAA-like compliance goals, but is a good starting point, especially for free trials.
The other option that was very appealing to me is to give each user a VM that just gets de-scheduled when no requests are being made. Instead of k8s managing nodes, nodes would manage k8s. The downside there is that cluster size is limited to whatever the largest node you can buy is, but honestly, 448vCPUs is a ton (AWS's max instance size at the moment), so it's a very workable solution. When users sign up, create a VM image that runs K8s, Minio, Postgres, etc. and route traffic to it with a shared L7 router/front proxy. If their workloads autoscale up, freeze and migrate the VM to a machine with more resources. If they're not using it for a while, freeze it completely, and reprogram your front proxy to point at a program that waits for an RPC / web request and starts up the VM when one comes in. Now your idle cost is the cost of your block storage, modulo deduplication, instead of dedicated CPU cores and RAM. You also get a lot of knobs to control your actual compute cost; you aren't reliant on your users provisioning spot instances from their cloud provider, you can just tell cron jobs to run when CPU load is lowest, or set your own rate to incentive off-peak usage. And, you can pretty much get away with charging nothing for idle instances, limit free trials in aggregate to X CPU cores, etc. I think it would have been good, though complex.
TL;DR: RDS is a highly-available always-on service. But customers might not want HA or always-on. By being able to turn off the database at the right moment, you can save a lot of money on compute, which makes things like good free trials more economically viable. I think OP is on the right track to a successful k8s-based business and wish them great luck!
Sorry for not answering everyone individually, but I see some confusion duo to the lack of context about what we do as a company.
First things first, Nhost falls into the category of backend-as-a-service. We provision and operate infrastructure at scale, and we also provide and run the necessary services for features such as user authentication and file storage, for users creating applications and businesses. A project/backend is comprised of a Postgres Database and the aforementioned services, none of it is shared. You get your own GraphQL engine, your own auth service, etc. We also provide the means to interface with the backend through our official SDKs.
Some points I see mentioned below that are worth exploring:
- One RDS instance per tenant is prohibited from a cost perspective, obviously. RDS is expensive and we have a very generous free tier.
- We run the infrastructure for thousands of projects/backends which we have absolutely no control over what they are used for. Users might be building a simple job board, or the next Facebook (please don't). This means we have no idea what the workloads and access patterns will look like.
- RDS is mature and a great product, AWS is a billion dolar company, etc - that is all true. But is it also true that we do not control if a user's project is missing an index and the fact that RDS does not provide any means to limit CPU/memory usage per database/tenant.
- We had a couple of discussions with folks at AWS and for the reasons already mentioned, there was no obvious solution to our problem. Let me reiterate this, the folks that own the service didn't have a solution to our problem given our constraints.
- Yes, this is a DIY scenario, but this is part of our core business.
I hope this clarifies some of the doubts. And I expect to have a more detailed and technical blog post about our experience soon.
By the way, we are hiring. If you think what we're doing is interesting and you have experience operating Postgres at scale, please write me an email at nuno@nhost.io. And don't forget to star us at https://github.com/nhost/nhost.