Analytical View of Garbage Collection in Solid State Media File Systems
Essay Preview: Analytical View of Garbage Collection in Solid State Media File Systems
Report this essay
Analytical View of Garbage Collection in Solid State Media File Systems
Nilay Khandelwal and Veni Taneja
(Thesis written under the course Operating Systems)
Department of Computer Science & Engineering
Jaypee Institute of Information Technology, India
[email protected], [email protected]
1.0 Overview
Solid State medias such as flash memory are non-volatile computer memory that can be electrically erased and reprogrammed. These are specific types of EEPROMs that are erased and programmed in large blocks.
Flash memory is non-volatile, which means that it does not need power to maintain the information stored in the chip. In addition, flash memory offers fast read access times and better kinetic shock resistance than hard disks (Table 1).
Though flash memory has many advantages, its special hardware characteristics impose design challenges on storage systems.
Media
Access Time
Read (512B)
Write (512B)
Erase
2.56µs
2.56µs
NOR Flash
14.4µs
3.53ms
1.2s (128kB)
NAND Flash
135.9µs
226µs
2-3ms (16kB)
12.4ms
12.4ms
Table 1. Characteristics of different storage media. (NOR Flash: Intel 28F128JF3A-150, NAND Flash: K9F5608U0M)
1.1 Architecture
1.1.1 Partitions
Flash memory is divided into one or many sections of memory called partitions. A multi-partition architecture allows system processor to multi-task the I/O operations with the flash memory. While the processor can read from one partition, it can write/erase in another.
Figure 1. Partitions, Blocks and Pages
1.1.2 Blocks
In addition to partitions, flash memory is further divided into sections of memory called blocks (Figure 1). Flash memory devices in which all blocks are of the same size are symmetrically-blocked, while devices which are asymmetrically-blocked generally have several blocks that are significantly smaller than main array of flash blocks. Small blocks are typically used for storing small data or boot code. Block sizes vary 64kB to 256kB.
1.1.3 Pages
Each block in flash memory comprises of fixed number of pages (Figure 1). A page is typically of size 512B to 2kB. While erase operations can be done only on blocks, I/O operations can be done on every page.
1.2 Programming Data
Flash devices allow programming values from logical “1” to “0”, but not from “0” to “1” (Figure 2). To program values back to “1”s requires erasing a full block. In most cases when data is data edited, it can either be re-written to the same block by caching the original, erasing the block and re-writing over it, or by writing the edited file to a new location in flash and the old one invalidated. Eventually, invalid data needs to be reclaimed which is usually done as a background process.
Figure 2. Flash Programming Limitations
2.0 Flash File System Functions
While flash file systems have many functions in common with file systems for other media, there are many that are unique to file systems for flash devices.
2.1 Wear Leveling
Each block in a flash memory device has a finite number of erase-write cycles (~ 10,000 to 100,000). To increase the life of the flash device, writes and erases should be spread as evenly as possible over all the blocks of the device. This is called wear leveling. Care needs to be taken in the software to balance performance with evenly spread wear leveling of blocks.
2.2 Garbage Collection
As mentioned previously, edits to files are usually not done “in place”, rather data is written to a new location and the old data is invalidated. Since blocks should be erased in advance before updating, updates in place are not efficient.
The invalid data needs to be cleaned up at regular intervals and this process is called garbage collection or reclaim. When a block is reclaimed the valid data in the block is migrated to a clean (erased) block of data called the spare block. When the reclaim process is completed the old block is erased and it becomes the new spare block.
There have been many researches on garbage collection algorithms (or cleaning policies) for log-structured file systems used on disk-based storages. Garbage collection algorithms should deal with some issues such as how many blocks to erase, which blocks to erase, and where to migrate valid data from erased block. The primary concern of garbage collection algorithms has been to reduce the cleaning cost. But, the number of victim blocks is also a problem for garbage collection policy of flash memory file system. This is because the costs of erase operations are much higher than read/write operations and thus garbage collection could disturb normal I/O operations to the device.
3.0 Garbage Collection in Detail
Data on flash memory cannot be over-written unless the blocks are erased in advanced. Also, erase operations can occur only in larger units than write operations and hence it takes an order of magnitude longer than the write operations. The erase operation is hence slow that usually decreases the system performance and consumes more power.
Therefore, if every update operation is performed in place, then system performance will be poor since updating even one byte requires one erase and several write operations. In order to avoid having to erase during every update, a logging approach can be used since it is quite effective in several ways.
First, logging solves the inability to update in place, since an update results in a new write at the end of the log and invalidation of the old. The natural separation of asynchronous erases from writes allows write operations