Анализ Ext2fs
Louis-Dominique Dubeau
Блоки - базовые кирпичики файловой системы.
Менеджер файловой системы делает запросы к диску ,
и вся информация при этом определяется в цело-численном количестве блоков диска.
Блоки файловой системы зарезервированы для эксклюзивного пользования админом.
Информация об этом хранится в члене s_r_blocks_count
структуры superblock. Смотрите Superblock
Если общее число свободных блоков становится равным зарезервированному числу блоков,
обычный пользователь более не может для себя выделить место на диске.
Только админ может сделать это.
Имея в запасе резервные блоки,мы всегда будем знать,что у нас есть минимальное место
для загрузки системы.
Есть 2 вида блоков - логические и физические.
Размер этих блоков может меняться.
Размер логического блока равен размеру физического блока , умноженному на степень числа 2.
Логические файловые адреса растут от нуля вверх вплоть до общего числа блоков.
Т.н. block zero - это загрузочный блок , который доступен только для специальных операций.
Проблема с блоками общеизвестна : файл может размещаться на не-целом числе блоков,
когда последний блок оказывается не полностью занятым,
при этом происходит потеря свободного места на диске.
Для решения этой проблемы файловые системы используют фрагменты.
Размер фрагмента равен размеру физического блока,умноженному на степень числа 2.
Файл - это последовательность блоков,которые в свою очередь состоят из фрагментов.
Последний блок может состоять из ограниченного числа фрагментов.
Блоки на диске разбиты на группы.
Каждая группа включает в себя критическую файловую информацию.
Использование групп позволяет эффективно использовать диск.
Each group contains in that order:
The superblock and group descriptors of each group must carry the same
values on disk.
Структура суперблока : [include/linux/ext2_fs.h]:
struct ext2_super_block {
unsigned long s_inodes_count;
unsigned long s_blocks_count;
unsigned long s_r_blocks_count;
unsigned long s_free_blocks_count;
unsigned long s_free_inodes_count;
unsigned long s_first_data_block;
unsigned long s_log_block_size;
long s_log_frag_size;
unsigned long s_blocks_per_group;
unsigned long s_frags_per_group;
unsigned long s_inodes_per_group;
unsigned long s_mtime;
unsigned long s_wtime;
unsigned short s_mnt_count;
short s_max_mnt_count;
unsigned short s_magic;
unsigned short s_state;
unsigned short s_errors;
unsigned short s_pad;
unsigned long s_lastcheck;
unsigned long s_checkinterval;
unsigned long s_reserved[238];
};
s_inodes_count
- the total number of inodes on the fs.
s_blocks_count
- the total number of blocks on the fs.
s_r_blocks_count
- the total number of blocks reserved for the exclusive use of the
superuser.
s_free_blocks_count
- the total number of free blocks on the fs.
s_free_inodes_count
- the total number of free inodes on the fs.
s_first_data_block
- the position on the fs of the first data block. Usually, this is block
number 1 for fs containing 1024 bytes blocks and is number 0 for other
fs.
s_log_block_size
- used to compute the logical block size in bytes. The logical block size
is in fact
1024 << s_log_block_size .
s_log_frag_size
- used to compute the logical fragment size. The logical fragment size is
in fact
1024 << s_log_frag_size if s_log_frag_size is positive
and 1024 >> -s_log_frag_size if s_log_frag_size is negative.
s_blocks_per_group
- the total number of blocks contained in a group.
s_frags_per_group
- the total number of fragments contained in a group.
s_inodes_per_group
- the total number of inodes contained in a group.
s_mtime
- the time at which the last mount of the fs was performed.
s_wtime
- the time at which the last write of the superblock on the fs was performed.
s_mnt_count
- the number of time the fs has been mounted in read-write mode without having
been checked.
s_max_mnt_count
- the maximum number of time the fs may be mounted in read-write mode before a
check must be done.
s_magic
- a magic number that permits the identification of the file system. It is
0xEF53 for a normal ext2fs and 0xEF51 for versions of
ext2fs prior to 0.2b.
s_state
- the state of the file system. It contains an or'ed value of EXT2_VALID_FS
(0x0001) which means: unmounted cleanly; and EXT2_ERROR_FS (0x0002) which
means: errors detected by the kernel code.
s_errors
- indicates what operation to perform when an error occurs. See section Error Handling
s_pad
- unused.
s_lastcheck
- the time of the last check performed on the fs.
s_checkinterval
- the maximum possible time between checks on the fs.
s_reserved
- unused.
Times are measured in seconds since 00:00:00 GMT, January 1, 1970.
Once the superblock is read in memory, the ext2fs kernel code calculates
some other information and keeps them in another structure. This structure
has the following layout:
struct ext2_sb_info {
unsigned long s_frag_size;
unsigned long s_frags_per_block;
unsigned long s_inodes_per_block;
unsigned long s_frags_per_group;
unsigned long s_blocks_per_group;
unsigned long s_inodes_per_group;
unsigned long s_itb_per_group;
unsigned long s_desc_per_block;
unsigned long s_groups_count;
struct buffer_head * s_sbh;
struct ext2_super_block * s_es;
struct buffer_head * s_group_desc[EXT2_MAX_GROUP_DESC];
unsigned short s_loaded_inode_bitmaps;
unsigned short s_loaded_block_bitmaps;
unsigned long s_inode_bitmap_number[EXT2_MAX_GROUP_LOADED];
struct buffer_head * s_inode_bitmap[EXT2_MAX_GROUP_LOADED];
unsigned long s_block_bitmap_number[EXT2_MAX_GROUP_LOADED];
struct buffer_head * s_block_bitmap[EXT2_MAX_GROUP_LOADED];
int s_rename_lock;
struct wait_queue * s_rename_wait;
unsigned long s_mount_opt;
unsigned short s_mount_state;
};
s_frag_size
- fragment size in bytes.
s_frags_per_block
- number of fragments in a block.
s_inodes_per_block
- number of inodes in a block of the inode table.
s_frags_per_group
- number of fragments in a group.
s_blocks_per_group
- number of blocks in a group.
s_inodes_per_group
- number of inodes in a group.
s_itb_per_group
- number of inode table blocks per group.
s_desc_per_block
- number of group descriptors per block.
s_groups_count
- number of groups.
s_sbh
- the buffer containing the disk superblock in memory.
s_es
- pointer to the superblock in the buffer.
s_group_desc
- pointers to the buffers containing the group descriptors.
s_loaded_inode_bitmaps
- number of inodes bitmap cache entries used.
s_loaded_block_bitmaps
- number of blocks bitmap cache entries used.
s_inode_bitmap_number
- indicates to which group the inodes bitmap in the buffers belong.
s_inode_bitmap
- inode bitmap cache.
s_block_bitmap_number
- indicates to which group the blocks bitmap in the buffers belong.
s_block_bitmap
- block bitmap cache.
s_rename_lock
- lock used to avoid two simultaneous rename operations on a fs.
s_rename_wait
- wait queue used to wait for the completion of a rename operation in progress.
s_mount_opt
- the mounting options specified by the administrator.
s_mount_state
Most of those values are computed from the superblock on disk.
Linux ext2fs manager caches access to the inodes and blocks
bitmaps. This cache is a list of buffers ordered from the most recently
used to the last recently used buffer. Managers should use the same kind
of bitmap caching or other similar method of improving access time to
disk.
On disk, the group descriptors immediately follow the superblock and
each descriptor has the following layout:
struct ext2_group_desc
{
unsigned long bg_block_bitmap;
unsigned long bg_inode_bitmap;
unsigned long bg_inode_table;
unsigned short bg_free_blocks_count;
unsigned short bg_free_inodes_count;
unsigned short bg_used_dirs_count;
unsigned short bg_pad;
unsigned long bg_reserved[3];
};
bg_block_bitmap
- points to the blocks bitmap block for the group.
bg_inode_bitmap
- points to the inodes bitmap block for the group.
bg_inode_table
- points to the inodes table first block.
bg_free_blocks_count
- number of free blocks in the group.
bg_free_inodes_count
- number of free inodes in the group.
bg_used_dirs_count
- number of inodes allocated to directories in the group.
bg_pad
- padding.
The information in a group descriptor pertains only to the group it is
actually describing.
The ext2 file system uses bitmaps to keep track of allocated blocks
and inodes.
The blocks bitmap of each group refers to blocks ranging from the first
block in the group to the last block in the group. To access the bit of
a precise block, we first have to look for the group the block belongs
to and then look for the bit of this block in the blocks bitmap
contained in the group. It it very important to note that the blocks
bitmap refer in fact to the smallest allocation unit supported by the
file system: fragments. Since the block size is always a multiple of
fragment size, when the file system manager allocates a block, it
actually allocates a multiple number of fragments. This use of the
blocks bitmap permits to the file system manager to allocate and
deallocate space on a fragment basis.
The inode bitmap of each group refer to inodes ranging from the first
inode of the group to the last inode of the group. To access the bit of
a precise inode, we first have to look for the group the inode belongs
to and then look for the bit of this inode in the inode bitmap contained
in the group. To obtain the inode information from the inode table, the
process is the same, except that the final search is in the inode table
of the group instead of the inode bitmap.
An inode uniquely describes a file. Here's what an inode looks like on
disk:
struct ext2_inode {
unsigned short i_mode;
unsigned short i_uid;
unsigned long i_size;
unsigned long i_atime;
unsigned long i_ctime;
unsigned long i_mtime;
unsigned long i_dtime;
unsigned short i_gid;
unsigned short i_links_count;
unsigned long i_blocks;
unsigned long i_flags;
unsigned long i_reserved1;
unsigned long i_block[EXT2_N_BLOCKS];
unsigned long i_version;
unsigned long i_file_acl;
unsigned long i_dir_acl;
unsigned long i_faddr;
unsigned char i_frag;
unsigned char i_fsize;
unsigned short i_pad1;
unsigned long i_reserved2[2];
};
i_mode
- type of file (character, block, link, etc.) and access rights on the
file.
i_uid
- uid of the owner of the file.
i_size
- logical size in bytes.
i_atime
- last time the file was accessed.
i_ctime
- last time the inode information of the file was changed.
i_mtime
- last time the file content was modified.
i_dtime
- when this file was deleted.
i_gid
- gid of the file.
i_links_count
- number of links pointing to this file.
i_blocks
- number of blocks allocated to this file counted in 512 bytes units.
i_flags
- flags (see below).
i_reserved1
- reserved.
i_block
- pointers to blocks (see below).
i_version
- version of the file (used by NFS).
i_file_acl
- control access list of the file (not used yet).
i_dir_acl
- control access list of the directory (not used yet).
i_faddr
- block where the fragment of the file resides.
i_frag
- number of the fragment in the block.
i_size
- size of the fragment.
i_pad1
- padding.
i_reserved2
- reserved.
As you can see, the inode contains, EXT2_N_BLOCKS (15 in ext2fs
0.5) pointers to block. Of theses pointers, the first
EXT2_NDIR_BLOCKS (12) are direct pointers to data. The following entry
points to a block of pointers to data (indirect). The following entry
points to a block of pointers to blocks of pointers to data (double
indirection). The following entry points to a block of pointers to a
block of pointers to a block of pointers to data (triple indirection).
The inode flags may take one or more of the following or'ed values:
EXT2_SECRM_FL 0x0001
- secure deletion. This usually means that when this flag is set and we
delete the file, random data is written in the blocks previously allocated
to the file.
EXT2_UNRM_FL 0x0002
- undelete. When this flag is set and the file is being deleted, the file
system code must store enough information to ensure the undeletion of
the file (to a certain extent).
EXT2_COMPR_FL 0x0004
- compress file. The content of the file is compressed, the file system
code must use compression/decompression algorithms when accessing the
data of this file.
EXT2_SYNC_FL 0x0008
- synchronous updates. The disk representation of this file must be kept
in sync with it's in core representation. Asynchronous I/O on this kind
of file is not possible. The synchronous updates only apply to the inode
itself and to the indirect blocks. Data blocks are always written
asynchronously on the disk.
Some inodes have a special meaning:
EXT2_BAD_INO 1
- a file containing the list of bad blocks on the file system.
EXT2_ROOT_INO 2
- the root directory of the file system.
EXT2_ACL_IDX_INO 3
- ACL inode.
EXT2_ACL_DATA_INO 4
- ACL inode.
EXT2_BOOT_LOADER_INO 5
- the file containing the boot loader. (Not used yet it seems.)
EXT2_UNDEL_DIR_INO 6
- the undelete directory of the system.
EXT2_FIRST_INO 11
- this is the first inode that does not have a special meaning.
Directories are special files that are used to create access path to
the files on disk. It is very important to understand that an inode may
have many access paths. Since the directories are essential part of the
file system, they have a specific structure. A directory file is a list
of entries of the following format:
struct ext2_dir_entry {
unsigned long inode;
unsigned short rec_len;
unsigned short name_len;
char name[EXT2_NAME_LEN];
};
inode
- points to the inode of the file.
rec_len
- length of the entry record.
name_len
- length of the file name.
name
- name of the file. This name may have a maximum length of
EXT2_NAME_LEN bytes (255 bytes as of version 0.5).
There is such an entry in the directory file for each file in the
directory. Since ext2fs is a Unix file system the first two entries in
the directory are file `.' and `..' which points to the
current directory and the parent directory respectively.
Here are the allocation algorithms that ext2 file system managers
must use. We are adamant on this point. Nowadays, many users
use more than one operating system on the same computer. If more than
one operating system use the same ext2 partition, they have to use the
same allocation algorithms. If they do otherwise, what will happen is
that one file system manager will undo the work of the other file system
manager. It is useless to have a manager that uses highly efficient
allocation algorithms if the other one does not bother with allocation
and uses quick and dirty algorithms.
Here are the rules used to allocate new inodes:
- the inode for a new file is allocated in the same group of the
inode of its parent directory.
- inodes are allocated equally between groups.
Here are the rules used to allocate new blocks:
- a new block is allocated in the same group as its inode.
- allocate consecutive sequences of blocks.
Of course, it may be sometimes impossible to abide by those rules. In
this case, the manager may allocate the block or inode anywhere.
This chapter describes how a standard ext2 file system must handle
errors. The superblock contains two parameters controlling the way
errors are handled. See section Superblock
The first of these is the s_mount_opt member of the superblock
structure in memory. Its value is computed from the options specified
when the fs is mounted. Its error handling related values are:
EXT2_MOUNT_ERRORS_CONT
- continue even if an error occurs.
EXT2_MOUNT_ERRORS_RO
- remount the file system read only.
EXT2_MOUNT_ERRORS_PANIC
- the kernel panics on error.
The second of these is the s_errors member of the superblock
structure on disk. It may take one of the following values:
EXT2_ERRORS_CONTINUE
- continue even if an error occurs.
EXT2_ERRORS_RO
- remount the file system read only.
EXT2_ERRORS_PANIC
- in which case the kernel simply panics.
EXT2_ERRORS_DEFAULT
- use the default behavior (as of 0.5a
EXT2_ERRORS_CONTINUE ).
s_mount_opt has precedence on s_errors .
Here are a couple of formulae usually used in ext2fs managers.
The block number of a file relative offset:
block = offset / s_blocksize
|
ter | rtertert 2011-05-04 11:58:22 | |
|