This is a tidy and thoughtful database architecture. The capabilities and design...

lichtenberger · on Jan 27, 2020

I think with modern hardware as for instance the now available first byte-addressable NVM variable sized pages and buffers should in theory get more widespread use and the reading/writing granularity gets more fine granular in the next years. I think as of now the Intel Optane memory however still has to fetch 256 Bytes at the minimum.

However, variable sized pages also allow page compression.

Can you give us some links to the mentioned issues and techniques that achieve high cache and storage bandwith efficiency without VSBs?

jandrewrogers · on Jan 27, 2020

I can explain it, the methods are straightforward. As with most things in database engine design, much of what is done in industry isn't in the literature.

The alternative to VSBs is for each logical index node to comprise a dynamic set of independent fixed buffers, with each buffer having an independent I/O schedule. This enables excellent cache efficiency because 1) space is incrementally allocated and 2) the cache only contains parts of logical node that you actually use. References to the underlying buffers remain valid even if the index node is resized. Designs vary but 8 to 64 buffers per index node seems to be the anecdotal range. The obvious caveat is that storage structures that presume an index node is completely in buffer, such as ordered trees, don't work well. Since some newer database designs have no ordered trees at all under the hood, this is not necessarily a problem. There are fast access methods that work well in this model.

The main issue with VSBs is that it is difficult to keep multiple references to the page consistent, some of which may not even be in memory, since critical metadata is typically in the reference itself. A workaround is to only allow a single reference to a page, but this restriction has an adverse impact on some types of important architectural optimization. The abstract objective makes sense, but no one that has looked into it has come up with a VSB scheme that does not have these tradeoffs for typical design cases. That said, VSBs are sometimes used in specialized databases where storage utilization efficiency (but not necessarily cache efficiency or performance) is paramount, though designed a bit differently than Umbra.

The reason to use larger page sizes, in addition to being more computationally efficient, is that it gives better performance with cheaper solid-state storage -- storage costs matter a lot. The sweet spot for price-performance is inexpensive read-optimized flash, which works far better for mixed workloads than you might expect if your storage engine is optimized for it. Excellent database kernels won't see much boost from byte-addressable NVM and people using poor database architectures don't care enough about performance to pay for expensive storage hardware, so it is a bit of a No Man's Land.

polskibus · on Jan 27, 2020

Could you elaborate on the problems with VSBs or perhaps point me to a paper that discusses them in detail?