#Day8 - Storage Engines (Fractal Tree)

A series of posts to understand databases better

Feb 16, 2023

Fractal Trees are a type of index structure used in databases to store and manage data on disk. It gained attention when it was used as storage algorithms in databases such as RocksDB and TokuDB (acquired by Percona). These databases use Fractal Trees for high-performance, scalable indexing and storage of large amounts of data. The Fractal Tree algorithm is designed to provide fast and efficient indexing, improved compression, and support for range queries and updates.

Understanding the problem

Let's understand three concepts to understand this topic better. Write, Space and Read Amplification.

Read amplification is the number of I/O’s required to read a particular query.

Space amplification refers to the increase in the amount of disk space required to store data in a database, compared to the original size of the data. It can occur as a result of various factors, such as the fragmentation, overhead required for data structures, duplicated data, or storage algorithms that result in increased storage usage. For example, B trees typically achieve only 75% space utilization due to fragmentation inside the B-tree blocks. Thus B trees suffer a space amplification of 4/3.

Write amplification is the amount of data written to storage compared to the amount of data that the application wrote. LSM trees and Fractal-Tree indexes both provide significant advantages over traditional B-trees. For example, if a database row contains 100 bytes, and a B tree such as InnoDB employs 16KiB pages, then the B tree may perform 16KiB of I/O to write a single 100-byte row, for a write amplification of 160, compared to a write amplification of 30 to 70 for the other data structures.

Write amplification is a problem on both rotating disks and solid-state disk (SSD), but for different reasons. For SSD, write amplification is a problem because flash-based SSD’s can be written to only a finite number of times. A B-tree performing random updates can easily use up a flash. Lower write amplification can help to reduce cost because you can buy cheaper flash (with fewer write cycles), and you may be able to provision a higher fraction of your flash storage to store useful data since the write blocks are larger. Larger write blocks help SSD.

To summarise the problem,

B-trees are good at lookup, but bad at insert.
LSM Trees are good at insert, but bad at lookup.
Is there a data structure that is about as good as a B-Tree for lookup, but has insertion performance closer to LSM Trees or append-only files?

The answer is, Fractal Trees.

No alt text provided for this image — http://www-db.deis.unibo.it/courses/TBD/Lezioni/mysqluc-2010-fractal-trees.pdf

How does it work?

Fractal Trees combines the advantages of B-trees and LSM-trees. The main idea behind Fractal Trees is to maintain an optimal balance between fast lookups (as in B-trees) and fast write performance (as in LSM-trees).

In a fractal tree index, a buffer is used to temporarily store updates and deletions before they are eventually merged into the main tree structure. The buffer helps improve write performance by reducing the number of disk I/O operations required to modify the index. The contents of the buffer are periodically merged with the main tree in a process known as "flushing". By using a buffer, fractal tree indexes can effectively balance the trade-off between write performance and index size, making them well-suited for use in high-performance database systems.

It is very similar to the B tree, except it has extra buffers, which happen to be empty. When a data record is inserted into the tree, instead of traversing the entire tree the way a B tree would, we simply insert the data record into the buffer at the root of the tree. Eventually the root buffer will fill up with new data records. At that point the Fractal Tree index copies the inserted records down a level of the tree. Eventually the newly inserted records will reach the leaves, at which point they are simply stored in a leaf node as a B-tree would store them. The data records descending through the buffers of the tree can be thought of as messages that say “insert this record”. Fractal Tree indexes can use other kinds of messages, such as messages that delete a record, or messages that update a record. (You can refer to the original papers in the reference section to dive in more details)

What are the disadvantages of Fractal Trees?

Fractal trees, like any other data structure, have certain limitations and drawbacks. Some of the disadvantages of fractal trees include:

Complexity: Fractal trees are a complex data structure and may require more effort to implement and maintain compared to other data structures.
Performance trade-offs: Fractal trees may offer improved performance in certain use cases, but may not be the best choice for all types of workloads. For example, they may not be well suited for large sequential scans or write-intensive workloads.
Fractal index trees do not achieve high write throughput by taking advantage of large sequential writes, and do not employ in-memory indexes.
Limited adoption: Fractal trees are a relatively new data structure, and as such, may not have as much support and community resources available compared to more established structures such as B-trees.

Which databases use Fractal Trees?

Fractal Trees are used as storage algorithms in databases such as RocksDB, TokuDB, and PerconaFT. These databases use Fractal Trees for high-performance, scalable indexing and storage of large amounts of data. The Fractal Tree algorithm is designed to provide fast and efficient indexing, improved compression, and support for range queries and updates.

References

http://www-db.deis.unibo.it/courses/TBD/Lezioni/mysqluc-2010-fractal-trees.pdf
http://www.pandademo.com/wp-content/uploads/2017/12/A-Comparison-of-Fractal-Trees-to-Log-Structured-Merge-LSM-Trees.pdf
https://www.slideshare.net/tmcallaghan/20131112-plukfractaltreestheorytopractice

Database Internals Newsletter

#Day7 - Storage Engines (LSM-Tree)

LSM (Log-Structured Merge) tree is a data storage algorithm used for storing and managing large amounts of data in a log-structured way. It is a variant of the B-tree, which is a type of self-balancing tree data structure. The LSM tree is designed to handle high write loads and high levels of data fragmentation. It uses a log-structured approach for stor…

2 years ago · Farhan

#database #sql #algorithms #systemdesign #arriqaaq #datastructures #immudb #mysql #postgres #mongodb #berkeleydb #tile38

Arriqaaq Newsletter

Discussion about this post