Extents is a disk-based storage algorithm that is used in some database management systems (DBMSs) to store and retrieve data. The main goal of Extents is to minimize disk I/O by allocating large blocks of disk space instead of small blocks. It is a block-oriented storage algorithm, which means that it stores data in fixed-size blocks or chunks of data on disk, rather than on a per-page basis. Each extent is a sequence of contiguous blocks on disk, and when a file is created, it is allocated space in one or more extents.
In this approach, when a new piece of data is added to the database, the databases looks for a large block of free space, called an "extent", to store the data. Once an extent is found, the database divides it into smaller blocks and stores the new data in one of these blocks. This can help to reduce disk I/O and improve performance, as the disk head does not need to move as frequently to access the data.
Extents are also used to reduce fragmentation. Extents are allocated in a contiguous block, which means that the data is stored in a contiguous block on disk. This reduces fragmentation and can improve performance, as the disk head does not need to move as frequently to access the data.
In some databases, the extents are also used to implement a form of data compression, by storing multiple rows of data in a single block. This can reduce the number of blocks needed to store a given amount of data, which can help to reduce disk I/O and improve performance.
Which databases use this approach?
The Extents disk-based storage algorithm is used in some databases such as Oracle, IBM Informix and Microsoft SQL Server.
In Oracle, the Extents are used in the Automatic Storage Management (ASM) feature, which is a volume manager and file system for Oracle Database files. ASM uses extents to manage the space allocation for the data files, temp files and control files of the database.
In Microsoft SQL Server, the Extents are used in the Storage Engine, the SQL Server Storage Engine uses extents to manage the space allocation for the data and index pages. Extents are 8 contiguous pages (8KB) that are used to store the data in a SQL Server database. Each extent is composed of eight 8 KB pages which are physically located next to each other on disk.
In IBM Informix database, an extent consists of a collection of contiguous pages that store data for a given table.When you create a table, the database server allocates a fixed amount of space to contain the data to be stored in that table. (See Tables.) When this space fills, the database server must allocate space for additional storage. The physical unit of storage that the database server uses to allocate both the initial and subsequent storage space is called an extent.
Use of extents in file systems
Apple File System (APFS) is a file system developed by Apple for their operating systems. APFS uses extents to store files on disk and to minimize disk I/O. Extents in APFS are the basic unit of space allocation and are used to track the allocation of disk space within the file system.
In APFS, an extent is a sequence of contiguous blocks on disk. When a file is created, it is allocated space in one or more extents. When a file is deleted, the extents that were allocated to it are freed up and made available for use by other files. This can help to reduce fragmentation and improve performance, as the disk head does not need to move as frequently to access the data.
APFS also uses a technique called "sparse extents" which is an optimization of the traditional extent-based storage. Sparse extents allow the file system to allocate disk space to a file only when it's needed and not when the file is created, this allows to save disk space and it's more efficient.
Extents vs Page-oriented storage
In contrast, page-oriented storage refers to a storage algorithm that stores data in fixed-size pages, where each page corresponds to a specific location in memory. Each page is a fixed-size block of data that is stored on disk, and when a file is created, it is allocated space in one or more pages.
Both Extents and Page-oriented storage algorithms are used in databases and file systems, but they are different and have different use-cases. Extents are more efficient for large files that require a large amount of contiguous disk space, whereas Page-oriented storage is more suitable for small files that can be stored in a single page.
Dropbox use of extents in the magic pocket storage
Quoting from their article, “Magic Pocket organizes blocks into extents—containers of blocks—typically 1-2 GB in size. When an extent is open, it is able to accept writes; new blocks can be appended until the extent reaches its maximum size. Once full, the extent is closed, making it immutable and read-only.
These extents were already ideal for SMR disk sequential-only workloads. Because Magic Pocket's storage engine was setup to guarantee sequential writes and map extents onto a fixed set of SMR zones, we didn’t have to change much to support SMR-style sequential writes.
Our storage engine’s existing implementation of open extents stored block metadata—such as block length and hashes—on SSD cache disks, while raw block data was stored on SMR disks. To get rid of the SSD cache, we would have to store both block metadata and raw block data on the same SMR disk.
We introduced a new extent format which stored block metadata and raw block data inline. On startup, the storage engine would parse the data on extents into metadata and raw data. It would then build an in-memory index mapping each block to its offset in the extent. On each new block write, blocks were inlined with their metadata, serialized, and committed to the SMR disk directly.”
https://dropbox.tech/infrastructure/increasing-magic-pocket-write-throughput-by-removing-our-ssd-cache-disks
#database #sql #algorithms #systemdesign #arriqaaq #datastrutures #microsoftsqlserver #oracle #ibm #informix