Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
 iakovlev.org 
 Kernels
 Boot 
 Memory 
 File system
 0.01
 1.0 
 2.0 
 2.4 
 2.6 
 3.x 
 4.x 
 5.x 
 6.x 
 Интервью 
 Kernel
 HOW-TO 1
 Ptrace
 Kernel-Rebuild-HOWTO
 Runlevel
 Linux daemons
 FAQ
NEWS
Последние статьи :
  Тренажёр 16.01   
  Эльбрус 05.12   
  Алгоритмы 12.04   
  Rust 07.11   
  Go 25.12   
  EXT4 10.11   
  FS benchmark 15.09   
  Сетунь 23.07   
  Trees 25.06   
  Apache 03.02   
 
TOP 20
 Linux Kernel 2.6...5170 
 Trees...938 
 Максвелл 3...870 
 Go Web ...823 
 William Gropp...802 
 Ethreal 3...787 
 Gary V.Vaughan-> Libtool...772 
 Ethreal 4...770 
 Rodriguez 6...763 
 Ext4 FS...755 
 Steve Pate 1...754 
 Clickhouse...753 
 Ethreal 1...742 
 Secure Programming for Li...731 
 C++ Patterns 3...716 
 Ulrich Drepper...696 
 Assembler...694 
 DevFS...660 
 Стивенс 9...649 
 MySQL & PosgreSQL...631 
 
  01.01.2024 : 3621733 посещений 

iakovlev.org

Анализ Ext2fs

Louis-Dominique Dubeau






Блоки и фрагменты

Блоки - базовые кирпичики файловой системы. Менеджер файловой системы делает запросы к диску , и вся информация при этом определяется в цело-численном количестве блоков диска.

Блоки файловой системы зарезервированы для эксклюзивного пользования админом. Информация об этом хранится в члене s_r_blocks_count структуры superblock. Смотрите Superblock Если общее число свободных блоков становится равным зарезервированному числу блоков, обычный пользователь более не может для себя выделить место на диске. Только админ может сделать это. Имея в запасе резервные блоки,мы всегда будем знать,что у нас есть минимальное место для загрузки системы.

Есть 2 вида блоков - логические и физические. Размер этих блоков может меняться. Размер логического блока равен размеру физического блока , умноженному на степень числа 2.

Логические файловые адреса растут от нуля вверх вплоть до общего числа блоков. Т.н. block zero - это загрузочный блок , который доступен только для специальных операций.

Проблема с блоками общеизвестна : файл может размещаться на не-целом числе блоков, когда последний блок оказывается не полностью занятым, при этом происходит потеря свободного места на диске.

Для решения этой проблемы файловые системы используют фрагменты. Размер фрагмента равен размеру физического блока,умноженному на степень числа 2. Файл - это последовательность блоков,которые в свою очередь состоят из фрагментов. Последний блок может состоять из ограниченного числа фрагментов.

Группы

Блоки на диске разбиты на группы. Каждая группа включает в себя критическую файловую информацию. Использование групп позволяет эффективно использовать диск.

Each group contains in that order:

The superblock and group descriptors of each group must carry the same values on disk.

Superblock

Структура суперблока : [include/linux/ext2_fs.h]:

 struct ext2_super_block {
   unsigned long  s_inodes_count;
   unsigned long  s_blocks_count;
   unsigned long  s_r_blocks_count;
   unsigned long  s_free_blocks_count;
   unsigned long  s_free_inodes_count;
   unsigned long  s_first_data_block;
   unsigned long  s_log_block_size;
   long           s_log_frag_size;
   unsigned long  s_blocks_per_group;
   unsigned long  s_frags_per_group;
   unsigned long  s_inodes_per_group;
   unsigned long  s_mtime;
   unsigned long  s_wtime;
   unsigned short s_mnt_count;
   short          s_max_mnt_count;
   unsigned short s_magic;
   unsigned short s_state;
   unsigned short s_errors;
   unsigned short s_pad;
   unsigned long  s_lastcheck;
   unsigned long  s_checkinterval;
   unsigned long  s_reserved[238];
 };
 

s_inodes_count
the total number of inodes on the fs.

s_blocks_count
the total number of blocks on the fs.

s_r_blocks_count
the total number of blocks reserved for the exclusive use of the superuser.

s_free_blocks_count
the total number of free blocks on the fs.

s_free_inodes_count
the total number of free inodes on the fs.

s_first_data_block
the position on the fs of the first data block. Usually, this is block number 1 for fs containing 1024 bytes blocks and is number 0 for other fs.

s_log_block_size
used to compute the logical block size in bytes. The logical block size is in fact 1024 << s_log_block_size.

s_log_frag_size
used to compute the logical fragment size. The logical fragment size is in fact 1024 << s_log_frag_size if s_log_frag_size is positive and 1024 >> -s_log_frag_size if s_log_frag_size is negative.

s_blocks_per_group
the total number of blocks contained in a group.

s_frags_per_group
the total number of fragments contained in a group.

s_inodes_per_group
the total number of inodes contained in a group.

s_mtime
the time at which the last mount of the fs was performed.

s_wtime
the time at which the last write of the superblock on the fs was performed.

s_mnt_count
the number of time the fs has been mounted in read-write mode without having been checked.

s_max_mnt_count
the maximum number of time the fs may be mounted in read-write mode before a check must be done.

s_magic
a magic number that permits the identification of the file system. It is 0xEF53 for a normal ext2fs and 0xEF51 for versions of ext2fs prior to 0.2b.

s_state
the state of the file system. It contains an or'ed value of EXT2_VALID_FS (0x0001) which means: unmounted cleanly; and EXT2_ERROR_FS (0x0002) which means: errors detected by the kernel code.

s_errors
indicates what operation to perform when an error occurs. See section Error Handling

s_pad
unused.

s_lastcheck
the time of the last check performed on the fs.

s_checkinterval
the maximum possible time between checks on the fs.

s_reserved
unused.

Times are measured in seconds since 00:00:00 GMT, January 1, 1970.

Once the superblock is read in memory, the ext2fs kernel code calculates some other information and keeps them in another structure. This structure has the following layout:

 struct ext2_sb_info {
 	unsigned long s_frag_size;
 	unsigned long s_frags_per_block;
 	unsigned long s_inodes_per_block;
 	unsigned long s_frags_per_group;
 	unsigned long s_blocks_per_group;
 	unsigned long s_inodes_per_group;
 	unsigned long s_itb_per_group;
 	unsigned long s_desc_per_block;
 	unsigned long s_groups_count;
 	struct buffer_head * s_sbh;
 	struct ext2_super_block * s_es;
 	struct buffer_head * s_group_desc[EXT2_MAX_GROUP_DESC];
 	unsigned short s_loaded_inode_bitmaps;
 	unsigned short s_loaded_block_bitmaps;
 	unsigned long s_inode_bitmap_number[EXT2_MAX_GROUP_LOADED];
 	struct buffer_head * s_inode_bitmap[EXT2_MAX_GROUP_LOADED];
 	unsigned long s_block_bitmap_number[EXT2_MAX_GROUP_LOADED];
 	struct buffer_head * s_block_bitmap[EXT2_MAX_GROUP_LOADED];
 	int s_rename_lock;
 	struct wait_queue * s_rename_wait;
 	unsigned long  s_mount_opt;
 	unsigned short s_mount_state;
 };
 

s_frag_size
fragment size in bytes.

s_frags_per_block
number of fragments in a block.

s_inodes_per_block
number of inodes in a block of the inode table.

s_frags_per_group
number of fragments in a group.

s_blocks_per_group
number of blocks in a group.

s_inodes_per_group
number of inodes in a group.

s_itb_per_group
number of inode table blocks per group.

s_desc_per_block
number of group descriptors per block.

s_groups_count
number of groups.

s_sbh
the buffer containing the disk superblock in memory.

s_es
pointer to the superblock in the buffer.

s_group_desc
pointers to the buffers containing the group descriptors.

s_loaded_inode_bitmaps
number of inodes bitmap cache entries used.

s_loaded_block_bitmaps
number of blocks bitmap cache entries used.

s_inode_bitmap_number
indicates to which group the inodes bitmap in the buffers belong.

s_inode_bitmap
inode bitmap cache.

s_block_bitmap_number
indicates to which group the blocks bitmap in the buffers belong.

s_block_bitmap
block bitmap cache.

s_rename_lock
lock used to avoid two simultaneous rename operations on a fs.

s_rename_wait
wait queue used to wait for the completion of a rename operation in progress.

s_mount_opt
the mounting options specified by the administrator.

s_mount_state

Most of those values are computed from the superblock on disk.

Linux ext2fs manager caches access to the inodes and blocks bitmaps. This cache is a list of buffers ordered from the most recently used to the last recently used buffer. Managers should use the same kind of bitmap caching or other similar method of improving access time to disk.

Group Descriptors

On disk, the group descriptors immediately follow the superblock and each descriptor has the following layout:

 struct ext2_group_desc
 {
   unsigned long  bg_block_bitmap;
   unsigned long  bg_inode_bitmap;
   unsigned long  bg_inode_table;
   unsigned short bg_free_blocks_count;
   unsigned short bg_free_inodes_count;
   unsigned short bg_used_dirs_count;
   unsigned short bg_pad;
   unsigned long  bg_reserved[3];
 };
 

bg_block_bitmap
points to the blocks bitmap block for the group.

bg_inode_bitmap
points to the inodes bitmap block for the group.

bg_inode_table
points to the inodes table first block.

bg_free_blocks_count
number of free blocks in the group.

bg_free_inodes_count
number of free inodes in the group.

bg_used_dirs_count
number of inodes allocated to directories in the group.

bg_pad
padding.

The information in a group descriptor pertains only to the group it is actually describing.

Bitmaps

The ext2 file system uses bitmaps to keep track of allocated blocks and inodes.

The blocks bitmap of each group refers to blocks ranging from the first block in the group to the last block in the group. To access the bit of a precise block, we first have to look for the group the block belongs to and then look for the bit of this block in the blocks bitmap contained in the group. It it very important to note that the blocks bitmap refer in fact to the smallest allocation unit supported by the file system: fragments. Since the block size is always a multiple of fragment size, when the file system manager allocates a block, it actually allocates a multiple number of fragments. This use of the blocks bitmap permits to the file system manager to allocate and deallocate space on a fragment basis.

The inode bitmap of each group refer to inodes ranging from the first inode of the group to the last inode of the group. To access the bit of a precise inode, we first have to look for the group the inode belongs to and then look for the bit of this inode in the inode bitmap contained in the group. To obtain the inode information from the inode table, the process is the same, except that the final search is in the inode table of the group instead of the inode bitmap.

Inodes

An inode uniquely describes a file. Here's what an inode looks like on disk:

 struct ext2_inode {
   unsigned short i_mode;
   unsigned short i_uid;
   unsigned long  i_size;
   unsigned long  i_atime;
   unsigned long  i_ctime;
   unsigned long  i_mtime;
   unsigned long  i_dtime;
   unsigned short i_gid;
   unsigned short i_links_count;
   unsigned long  i_blocks;
   unsigned long  i_flags;
   unsigned long  i_reserved1;
   unsigned long  i_block[EXT2_N_BLOCKS];
   unsigned long  i_version;
   unsigned long  i_file_acl;
   unsigned long  i_dir_acl;
   unsigned long  i_faddr;
   unsigned char  i_frag;
   unsigned char  i_fsize;
   unsigned short i_pad1;
   unsigned long  i_reserved2[2];
 };
 

i_mode
type of file (character, block, link, etc.) and access rights on the file.

i_uid
uid of the owner of the file.

i_size
logical size in bytes.

i_atime
last time the file was accessed.

i_ctime
last time the inode information of the file was changed.

i_mtime
last time the file content was modified.

i_dtime
when this file was deleted.

i_gid
gid of the file.

i_links_count
number of links pointing to this file.

i_blocks
number of blocks allocated to this file counted in 512 bytes units.

i_flags
flags (see below).

i_reserved1
reserved.

i_block
pointers to blocks (see below).

i_version
version of the file (used by NFS).

i_file_acl
control access list of the file (not used yet).

i_dir_acl
control access list of the directory (not used yet).

i_faddr
block where the fragment of the file resides.

i_frag
number of the fragment in the block.

i_size
size of the fragment.

i_pad1
padding.

i_reserved2
reserved.

As you can see, the inode contains, EXT2_N_BLOCKS (15 in ext2fs 0.5) pointers to block. Of theses pointers, the first EXT2_NDIR_BLOCKS (12) are direct pointers to data. The following entry points to a block of pointers to data (indirect). The following entry points to a block of pointers to blocks of pointers to data (double indirection). The following entry points to a block of pointers to a block of pointers to a block of pointers to data (triple indirection).

The inode flags may take one or more of the following or'ed values:

EXT2_SECRM_FL 0x0001
secure deletion. This usually means that when this flag is set and we delete the file, random data is written in the blocks previously allocated to the file.

EXT2_UNRM_FL 0x0002
undelete. When this flag is set and the file is being deleted, the file system code must store enough information to ensure the undeletion of the file (to a certain extent).

EXT2_COMPR_FL 0x0004
compress file. The content of the file is compressed, the file system code must use compression/decompression algorithms when accessing the data of this file.

EXT2_SYNC_FL 0x0008
synchronous updates. The disk representation of this file must be kept in sync with it's in core representation. Asynchronous I/O on this kind of file is not possible. The synchronous updates only apply to the inode itself and to the indirect blocks. Data blocks are always written asynchronously on the disk.

Some inodes have a special meaning:

EXT2_BAD_INO 1
a file containing the list of bad blocks on the file system.

EXT2_ROOT_INO 2
the root directory of the file system.

EXT2_ACL_IDX_INO 3
ACL inode.

EXT2_ACL_DATA_INO 4
ACL inode.

EXT2_BOOT_LOADER_INO 5
the file containing the boot loader. (Not used yet it seems.)

EXT2_UNDEL_DIR_INO 6
the undelete directory of the system.

EXT2_FIRST_INO 11
this is the first inode that does not have a special meaning.

Directories

Directories are special files that are used to create access path to the files on disk. It is very important to understand that an inode may have many access paths. Since the directories are essential part of the file system, they have a specific structure. A directory file is a list of entries of the following format:

 struct ext2_dir_entry {
   unsigned long  inode;
   unsigned short rec_len;
   unsigned short name_len;
   char           name[EXT2_NAME_LEN];
 };
 

inode
points to the inode of the file.

rec_len
length of the entry record.

name_len
length of the file name.

name
name of the file. This name may have a maximum length of EXT2_NAME_LEN bytes (255 bytes as of version 0.5).

There is such an entry in the directory file for each file in the directory. Since ext2fs is a Unix file system the first two entries in the directory are file `.' and `..' which points to the current directory and the parent directory respectively.

Allocation algorithms

Here are the allocation algorithms that ext2 file system managers must use. We are adamant on this point. Nowadays, many users use more than one operating system on the same computer. If more than one operating system use the same ext2 partition, they have to use the same allocation algorithms. If they do otherwise, what will happen is that one file system manager will undo the work of the other file system manager. It is useless to have a manager that uses highly efficient allocation algorithms if the other one does not bother with allocation and uses quick and dirty algorithms.

Here are the rules used to allocate new inodes:

  • the inode for a new file is allocated in the same group of the inode of its parent directory.

  • inodes are allocated equally between groups.

Here are the rules used to allocate new blocks:

  • a new block is allocated in the same group as its inode.

  • allocate consecutive sequences of blocks.

Of course, it may be sometimes impossible to abide by those rules. In this case, the manager may allocate the block or inode anywhere.

Error Handling

This chapter describes how a standard ext2 file system must handle errors. The superblock contains two parameters controlling the way errors are handled. See section Superblock

The first of these is the s_mount_opt member of the superblock structure in memory. Its value is computed from the options specified when the fs is mounted. Its error handling related values are:

EXT2_MOUNT_ERRORS_CONT
continue even if an error occurs.

EXT2_MOUNT_ERRORS_RO
remount the file system read only.

EXT2_MOUNT_ERRORS_PANIC
the kernel panics on error.

The second of these is the s_errors member of the superblock structure on disk. It may take one of the following values:

EXT2_ERRORS_CONTINUE
continue even if an error occurs.

EXT2_ERRORS_RO
remount the file system read only.

EXT2_ERRORS_PANIC
in which case the kernel simply panics.

EXT2_ERRORS_DEFAULT
use the default behavior (as of 0.5a EXT2_ERRORS_CONTINUE).

s_mount_opt has precedence on s_errors.

Formulae

Here are a couple of formulae usually used in ext2fs managers.

The block number of a file relative offset:

block = offset / s_blocksize
Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными

 Автор  Комментарий к данной статье
ter
   rtertert
2011-05-04 11:58:22