MBR (Master Boot Record) is a special sector on a partitioned disk which holds information on the “logical” partitioning of the physical disk and what the underlaying file system is and how & where it stored files and other such entries.
MBR is a rather old concept, which was first introduced in 1983 for IBM-PC compatible machines. Nowadays MBR is getting replaced by GPT (GUID Partition Table). The main reason behind thing is the disk size limit which comes with MBR (2 TiB).
There are multiple structure to the MBR layout. Each one brings its own minor or major changes to the original design. The one rule is that the sector must fit inside the MBR sector size of 512 bytes. * Classical generic MBR * Modern standard MBR * AAP MBR * NEWLDR MBR * AST/NEC MS-DOS and SpeedStor MBR * Ontrack Disk Manager MBR
Usually when first loading a sector from disk, it’s best to check the 2 bytes (boot signature) at address offset
0x1FE. Now the little-endian order of the 2 bytes is expected to be
0xAA55 but it has been often mixed in documentation and official documents as
0x55AA. My honest advice to you is to check for both cases. But remember that the official implementation has
0x55 at offset
The diagrams below showcases the structure of a Generic MBR, including a detailed view inside the contents of a partition entry.
Boot Sector + Partition
You can safely ignore the “Bootstrap Code area”. The important fields for the MBR are the partition entries (
0x01EE) and as mentioned above, the signature.
When reading a partition entry from the MBR make sure to check the status field (
0x00) has the value
0x80. Next check the partition type, in this case to be FAT32 and to support LBA. The linked table has all the common and less common partition types.
LBA (Logical Block Allocation) is a common scheme used for specifying the address/location of blocks of data as they are stored on disk. Blocks are located by an index, counting up from 0.
This fits in with **MBR as the MBR partition must support CHS addressing, LBA abstracts the cylinder “layer”.**
There isn’t much to be said about LBA as it is only a scheme for simplifying disk sectors access. But it’s worth acknowledging the existence of LBA because different File Systems have version which may support it and version which may not. This can be determined based on the partition type.
FAT (File Allocation Table) is one of the most basic file system architectures. FAT is robust, simple and can provide performance even in lightweight implementations, but it’s no competitor to modern file system which offer scalability and reliability.
Originally designed in 1977 and intended for Floppy Disks, FAT was soon after adopted by DOS and Windows9x systems. Due to it’s popularity in desktop environments and the evolution of storage sizes, FAT now has three major versions (not counting the original 8 bit design)(the number represents the table element bits): * FAT12 * FAT16 * FAT32
FAT has since been cut out as the default File System from Windows machines, but it still remains popular in the portable media devices market.
By far the most cumbersome design when it comes to FAT, from my experience. This was mainly by the conflicting nature of this working implementation of a FAT file system and the rest of the documentation available online.
The digram below is based on 1 and 2. It’s hard to create a simple representation when taking in consideration all the backwards compatibility introduced by Microsoft over the years.
Out of all these values there are only few of interest when handling a FAT32 File System.
2 bytes- Bytes per sector (value is
1 byte- Sectors per cluster (a power of two)
2 bytes- Number of reserved sectors (common value is
1 byte- Number of FATs (value is
4 bytes- Sectors per FAT (varies based on disk size)
4 bytes- Root directory cluster (common value is
After checking these variables to make sure they fit your expectations you can extract some useful variables (starting LBAs for each of the regions).
A file will occupy at least one cluster depending on its size. A file taking up more than one cluster, will be represented using a chain of clusters. A chain of cluster may be fragmented across the Data Region. There are special entries better explained here.
It’s important to note the difference between a cluster and sector. Normally a cluster can be found across multiple sectors (usually at least 8 sectors, giving a cluster size of
A cluster (ID) can be transformed into an LBA value by applying the following formula:
lba = cluster_start_lba + (cluster_id - 2) * sectors_per_cluster. The
- 2 is applied because there are no cluster
1 and cluster’s IDs begin from
cluster_start_lba will be obtained from the Directory Table, starting from the root directory table.
The entries are 32 bytes long each. The first 11 bytes are the file name and extension (8 byte name + 3 byte extension). These are padded by spaces if the name is not long enough or shortened (
~1 is the shortened entry number in order to support multiple copies with the same short name and
SOM is simply the short version of
It’s important to note (and as mentioned above) the only good thing directory entries will provide are the first cluster number and some information about the file (as seen in the diagram).
The first directory table you’ll have access to is the root directory table, which is located on the Boot Record. When reading the directory table entries, always check the first byte of the name. Based on the value you can tell a couple of things:
0x00- End of directory table
After validating the entry, you can now extract the first cluster value by combining the
cluster_higher into a
4 byte value. With the cluster now you can perform a lookup into the FAT clusters.
By this point you should have access to the LBA of each region (boot record, reserved sectors, FATs, data) and the cluster value of the file we’re looking for.
The FAT32 table is a big array of 32bit entries where each one’s position in the array corresponds to a cluster number and the value indicates the next cluster for the file. You can think of this as a “singly linked list” (but not really) or more like an array where each value is another index inside the array.
At this point all you have to do is load up the starting LBA of the FAT tables and then use your cluster number as an offset. From there you will keep walking the clusters until you reach a value of
0xFFFFFFFF (I’ve seen
0x00000000 used in some cases, best check for both).