This is a tidy and thoughtful database architecture. The capabilities and design are broadly within the spectrum of the mainstream. At this point in database evolution, it is well established that sufficiently modern storage architecture and hardware eliminates most performance advantages of in-memory architectures. However, many details of the design in the papers indicate that this database will not be breaking any records for absolute performance on a given hardware quanta.
The most interesting bit is the use of variable size buffers (VSBs). The value of using VSBs is well known -- it improves cache and storage bandwidth efficiency -- but there are also reasons it is rarely seen in real-world architectures, and those issues are not really addressed here that I could find. Database companies have been researching this concept for decades. If one is unwilling to sacrifice absolute performance, and most database companies are not, the use of VSBs creates myriad devilish details and edge cases.
There are techniques that achieve high cache and storage bandwidth efficiency without VSBs (or their issues) but they are mostly incompatible with B+Tree style architectures like the above.
I think with modern hardware as for instance the now available first byte-addressable NVM variable sized pages and buffers should in theory get more widespread use and the reading/writing granularity gets more fine granular in the next years. I think as of now the Intel Optane memory however still has to fetch 256 Bytes at the minimum.
However, variable sized pages also allow page compression.
Can you give us some links to the mentioned issues and techniques that achieve high cache and storage bandwith efficiency without VSBs?
I can explain it, the methods are straightforward. As with most things in database engine design, much of what is done in industry isn't in the literature.
The alternative to VSBs is for each logical index node to comprise a dynamic set of independent fixed buffers, with each buffer having an independent I/O schedule. This enables excellent cache efficiency because 1) space is incrementally allocated and 2) the cache only contains parts of logical node that you actually use. References to the underlying buffers remain valid even if the index node is resized. Designs vary but 8 to 64 buffers per index node seems to be the anecdotal range. The obvious caveat is that storage structures that presume an index node is completely in buffer, such as ordered trees, don't work well. Since some newer database designs have no ordered trees at all under the hood, this is not necessarily a problem. There are fast access methods that work well in this model.
The main issue with VSBs is that it is difficult to keep multiple references to the page consistent, some of which may not even be in memory, since critical metadata is typically in the reference itself. A workaround is to only allow a single reference to a page, but this restriction has an adverse impact on some types of important architectural optimization. The abstract objective makes sense, but no one that has looked into it has come up with a VSB scheme that does not have these tradeoffs for typical design cases. That said, VSBs are sometimes used in specialized databases where storage utilization efficiency (but not necessarily cache efficiency or performance) is paramount, though designed a bit differently than Umbra.
The reason to use larger page sizes, in addition to being more computationally efficient, is that it gives better performance with cheaper solid-state storage -- storage costs matter a lot. The sweet spot for price-performance is inexpensive read-optimized flash, which works far better for mixed workloads than you might expect if your storage engine is optimized for it. Excellent database kernels won't see much boost from byte-addressable NVM and people using poor database architectures don't care enough about performance to pay for expensive storage hardware, so it is a bit of a No Man's Land.
The most interesting bit is the use of variable size buffers (VSBs). The value of using VSBs is well known -- it improves cache and storage bandwidth efficiency -- but there are also reasons it is rarely seen in real-world architectures, and those issues are not really addressed here that I could find. Database companies have been researching this concept for decades. If one is unwilling to sacrifice absolute performance, and most database companies are not, the use of VSBs creates myriad devilish details and edge cases.
There are techniques that achieve high cache and storage bandwidth efficiency without VSBs (or their issues) but they are mostly incompatible with B+Tree style architectures like the above.