I've recently been spending a fair amount of time trying to improve query performance on RDS. This includes reviewing and optimizing particularly nasty queries, tuning PG configuration (min_wal_size, random_page_cost, work_mem, etc). I am using a db.t3.xlarge with general purpose SSD (gp2) for a web server that sees moderate writes and a lot of reads. I know there's no real way to know other than through testing, but I'm not clear on which instance type best serves our needs — I think it may very well be the case that the t3 family isn't fit for our purposes. I'm also unclear on whether we ought to switch to provisioned IOPS SSD. Does anyone have any general pointers here? I know the question is pretty open-ended, but would be great if anyone has general advice from personal experience?
I'd recommend hopping off of t3 asap if you're searching for performance gains - performance can be extremely variable (by design). M class will even you out.
General storage IOPS is governed by your provisioned storage size. You can again get much more consistent performance by using provisioned IOPS.
Feel free to email me if you want to chat through things specific to your env - email is in my about:
I would advise that you try to fit the working set into memory before spending on provisioned IOPS. Reading a lot of data from network storage constantly should be avoided as much as possible, having more IOPS doesn't necessarily improve read latency.
Provisioned IOPS is much more expensive though, so make sure you really need it. If you use general IOPS you can monitor your burst balance. You can always start with general and then move to provisioned when you need it too.
Thanks. If I'm reading this https://ibb.co/bNGmrCB correctly, it seems like we have plenty burst balance. Does this seem to indicate that provisioned IOPS is unlikely to help us here?
Yeah you're looking good though I would recommend adding a CloudWatch alert to make sure it doesn't sneak up on you. IIRC in general provisioned IOPS can help with other performance attributes like throughput so I would look at the differences documented by AWS and then take a look at all of the relevant metrics to be certain.
It's hard to say without metrics; what does your CPU load look like? In general, unless your CPU is often maxing out, changing the CPU is unlikely to help, so you're left with either memory or IO.
Unused memory on Linux will be automatically used to cache IO operations, and you can also tweak PG itself to use more memory during queries (search for "work_mem", though there are others).
If your workload is read-heavy, just giving it more memory so that the majority of your dataset is always in the kernel IO cache will give you an immediate performance boost, without even having to tweak PG's config (though that might help even further). This won't transfer to writes - those still require an actual, uncached IO operation to complete (unless you want to put your data at risk, in which case there are parameters that can be used to override that).
For write-heavy workloads, you will need to upgrade IO; there's no way around the "provisioned IOPS" disks.
You've got some spikes that could signify some large or unoptimized queries, but otherwise yes, the CPU doesn't look that hot.
I suggest upgrading to an instance type which gives you 32GB or more of memory. You'll get a bigger CPU along with it as well, but don't make the CPU your priority, it's not your main bottleneck at the moment.
Makes sense, thank you. Sounds like M class is the way to go as other commenter suggested. Also, yes. There are many awful queries that I'm aware of and working to correct.
Thank you. I think this makes a lot more sense actually. I can go to db.r6g.xlarge and double the memory to 32gb from the t3.xlarge I'm currently on for an additional ~$117/month vs getting to 32gb with the db.m5.2xlarge for $308 more per month. Also, looks like X2g among the memory optimized is the lowest price per GiB of RAM (for MySQL, MariaDB, and PostgreSQL). Thoughts on the X2g? The db.x2g.large, for example, doubles memory AND network performance (Gbps) vs current DB for less than $30 more per month. Does drop vCPU down to 2 from 4, which might not matter given where CPU utilization seems to spike to (~50% at peak) https://ibb.co/WsxN6D3
General storage IOPS scales with disk size, roughly and to a point. It's often cheaper and faster to increase the instance storage than move to EBS, prioritized or not.
Of course if you need to recover quickly in a disaster you'll want a hot standby or replica. Still may be cheaper than PIOPs. (Especially if you need HA anyway.)
> It's often cheaper and faster to increase the instance storage than move to EBS, prioritized or not.
You're saying it may well be cheaper to increase storage in order to get more IOPS than moving to an EBS-optimized instance type?
Regarding HA, not relevant for at this point (assuming I understood you correctly). We've only got a single primary and one replica, the latter being used primarily for analytics.