Intel Architecture Software Developer's Manual
Volume 3 : System Programming
Chapter 2 : System architecture overview
Интеловские 32-битные процессоры обеспечивают расширенную поддержку
операционных систем.
Она является частью много-уровневой архитектуры
и включает следующие фичи:
Управление памятью
Защита
Многозадачность
Обработка исключений
Multiprocessing
Управление кешем
Управление питанием
Дебаг
В этом разделе делается краткий обзор архитектуры процессора.
Здесь описываются системные регистры,используемые для контроля процессора
на системном уровне
и дается краткое описание системных инструкций.
В основном эти фичи используются системными программистами.
Описано , как создать условия для того ,
чтобы на их основе системный программист мог создать надежную среду
для выполнения приложений , написанных уже прикладными программистами.
NOTE
В основном внимание будет сфокусировано
на защищенном режиме работы.
Как описано в Chapter 8,
интеловский процессор при включении запускается в реальном режиме,
и переключение в защищенный режим уже делает сама операционная система.
2.1. OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE
The Intel Architecture's system architecture consists of a set of registers, data structures, and
instructions designed to support basic system-level operations such as memory management,
interrupt and exception handling, task management, and control of multiple processors (multiprocessing).
Figure 2-1 provides a generalized summary of the system registers and data
structures.
Figure 2-1
2.1.1. Global and Local Descriptor Tables
When operating in protected mode, all memory accesses pass through either the global
descriptor table (GDT) or the (optional) local descriptor table (LDT), shown in Figure 2-1.
These tables contain entries called segment descriptors. A segment descriptor provides the base
address of a segment and access rights, type, and usage information. Each segment descriptor
has a segment selector associated with it. The segment selector provides an index into the GDT
or LDT (to its associated segment descriptor), a global/local flag (that determines whether the
segment selector points to the GDT or the LDT), and access rights information.
To access a byte in a segment, both a segment selector and an offset must be supplied. The
segment selector provides access to the segment descriptor for the segment (in the GDT or
LDT). From the segment descriptor, the processor obtains the base address of the segment in the
linear address space. The offset then provides the location of the byte relative to the base
address. This mechanism can be used to access any valid code, data, or stack segment in the
GDT or LDT, provided the segment is accessible from the current privilege level (CPL) at which
the processor is operating. (The CPL is defined as the protection level of the currently executing
code segment.)
In Figure 2-1 the solid arrows indicate a linear address, the dashed lines indicate a segment
selector, and the dotted arrows indicate a physical address. For simplicity, many of the segment
selectors are shown as direct pointers to a segment. However, the actual path from a segment
selector to its associated segment is always through the GDT or LDT.
The linear address of the base of the GDT is contained in the GDT register (GDTR); the linear
address of the LDT is contained in the LDT register (LDTR).
2.1.2. System Segments, Segment Descriptors, and Gates
Besides the code, data, and stack segments that make up the execution environment of a program
or procedure, the system architecture also defines two system segments: the task-state segment
(TSS) and the LDT. (The GDT is not considered a segment because it is not accessed by means
of a segment selector and segment descriptor.) Each of these segment types has a segment
descriptor defined for it.
The system architecture also defines a set of special descriptors called gates (the call gate, interrupt
gate, trap gate, and task gate) that provide protected gateways to system procedures and
handlers that operate at different privilege levels than application programs and procedures.
For example, a CALL to a call gate provides access to a procedure in a code segment that is at
the same or numerically lower privilege level (more privileged) than the current code segment.
To access a procedure through a call gate, the calling procedure1 must supply the selector of the
call gate. The processor than performs an access rights check on the call gate, comparing the
CPL with the privilege level of the call gate and the destination code segment pointed to by the
call gate. If access to the destination code segment is allowed, the processor gets the segment
selector for the destination code segment and an offset into that code segment from the call gate.
1. The word "procedure" is commonly used in this document as a general term for a logical unit or block of
code (such as a program, procedure, function, or routine). The term is not restricted to the definition of a
procedure in the Intel Architecture assembly language.
If the call requires a change in privilege level, the processor also switches to the stack for that
privilege level. (The segment selector for the new stack is obtained from the TSS for the
currently running task.) Gates also facilitate transitions between 16-bit and 32-bit code
segments, and vice versa.
2.1.3. Task-State Segments and Task Gates
The TSS (refer to Figure 2-1) defines the state of the execution environment for a task. It
includes the state of the general-purpose registers, the segment registers, the EFLAGS register,
the EIP register, and segment selectors and stack pointers for three stack segments (one stack
each for privilege levels 0, 1, and 2). It also includes the segment selector for the LDT associated
with the task and the page-table base address.
All program execution in protected mode happens within the context of a task, called the current
task. The segment selector for the TSS for the current task is stored in the task register. The
simplest method of switching to a task is to make a call or jump to the task. Here, the segment
selector for the TSS of the new task is given in the CALL or JMP instruction. In switching tasks,
the processor performs the following actions:
1. Stores the state of the current task in the current TSS.
2. Loads the task register with the segment selector for the new task.
3. Accesses the new TSS through a segment descriptor in the GDT.
4. Loads the state of the new task from the new TSS into the general-purpose registers, the
segment registers, the LDTR, control register CR3 (page-table base address), the EFLAGS
register, and the EIP register.
5. Begins execution of the new task.
A task can also be accessed through a task gate. A task gate is similar to a call gate, except that
it provides access (through a segment selector) to a TSS rather than a code segment.
2.1.4. Interrupt and Exception Handling
External interrupts, software interrupts, and exceptions are handled through the interrupt
descriptor table (IDT), refer to Figure 2-1. The IDT contains a collection of gate descriptors,
which provide access to interrupt and exception handlers. Like the GDT, the IDT is not a
segment. The linear address of the base of the IDT is contained in the IDT register (IDTR).
The gate descriptors in the IDT can be of the interrupt-, trap-, or task-gate type. To access an
interrupt or exception handler, the processor must first receive an interrupt vector (interrupt
number) from internal hardware, an external interrupt controller, or from software by means of
an INT, INTO, INT 3, or BOUND instruction. The interrupt vector provides an index into the
IDT to a gate descriptor. If the selected gate descriptor is an interrupt gate or a trap gate, the associated
handler procedure is accessed in a manner very similar to calling a procedure through a
call gate. If the descriptor is a task gate, the handler is accessed through a task switch.
2.1.5. Memory Management
The system architecture supports either direct physical addressing of memory or virtual memory
(through paging). When physical addressing is used, a linear address is treated as a physical
address. When paging is used, all the code, data, stack, and system segments and the GDT and
IDT can be paged, with only the most recently accessed pages being held in physical memory.
The location of pages (or page frames as they are sometimes called in the Intel Architecture) in
physical memory is contained in two types of system data structures (a page directory and a set
of page tables), both of which reside in physical memory (refer to Figure 2-1). An entry in a page
directory contains the physical address of the base of a page table, access rights, and memory
management information. An entry in a page table contains the physical address of a page frame,
access rights, and memory management information. The base physical address of the page
directory is contained in control register CR3.
To use this paging mechanism, a linear address is broken into three parts, providing separate
offsets into the page directory, the page table, and the page frame.
A system can have a single page directory or several. For example, each task can have its own
page directory.
2.1.6. System Registers
To assist in initializing the processor and controlling system operations, the system architecture
provides system flags in the EFLAGS register and several system registers:
The system flags and IOPL field in the EFLAGS register control task and mode switching,
interrupt handling, instruction tracing, and access rights. Refer to Section 2.3., "System
Flags and Fields in the EFLAGS Register" for a description of these flags.
The control registers (CR0, CR2, CR3, and CR4) contain a variety of flags and data fields
for controlling system-level operations. With the introduction of the PentiumR III
processor, CR4 now contains bits indicating support PentiumR III processor specific
capabilities within the OS. Refer to Section 2.5., "Control Registers" for a description of
these flags.
The debug registers (not shown in Figure 2-1) allow the setting of breakpoints for use in
debugging programs and systems software. Refer to Chapter 15, Debugging and
Performance Monitoring, for a description of these registers.
The GDTR, LDTR, and IDTR registers contain the linear addresses and sizes (limits) of
their respective tables. Refer to Section 2.4., "Memory-Management Registers" for a
description of these registers.
The task register contains the linear address and size of the TSS for the current task. Refer
to Section 2.4., "Memory-Management Registers" for a description of this register.
Model-specific registers (not shown in Figure 2-1).
The model-specific registers (MSRs) are a group of registers available primarily to operatingsystem
or executive procedures (that is, code running at privilege level 0). These registers
control items such as the debug extensions, the performance-monitoring counters, the machinecheck
architecture, and the memory type ranges (MTRRs). The number and functions of these
registers varies among the different members of the Intel Architecture processor families.
Section 8.4., "Model-Specific Registers (MSRs)" in Chapter 8, Processor Management and
Initialization for more information about the MSRs and Appendix B, Model-Specific Registers
for a complete list of the MSRs.
Most systems restrict access to all system registers (other than the EFLAGS register) by application
programs. Systems can be designed, however, where all programs and procedures run at
the most privileged level (privilege level 0), in which case application programs are allowed to
modify the system registers.
2.1.7. Other System Resources
Besides the system registers and data structures described in the previous sections, the system
architecture provides the following additional resources:
Operating system instructions (refer to Section 2.6., "System Instruction Summary").
Performance-monitoring counters (not shown in Figure 2-1).
Internal caches and buffers (not shown in Figure 2-1).
The performance-monitoring counters are event counters that can be programmed to count
processor events such as the number of instructions decoded, the number of interrupts received,
or the number of cache loads. Refer to Section 15.6., "Performance-Monitoring Counters", in
Chapter 15, Debugging and Performance Monitoring, for more information about these
counters.
The processor provides several internal caches and buffers. The caches are used to store both
data and instructions. The buffers are used to store things like decoded addresses to system and
application segments and write operations waiting to be performed. Refer to Chapter 9, Memory
Cache Control, for a detailed discussion of the processor's caches and buffers.
2.2. MODES OF OPERATION
The Intel Architecture supports three operating modes and one quasi-operating mode:
Protected mode. This is the native operating mode of the processor. In this mode all
instructions and architectural features are available, providing the highest performance and
capability. This is the recommended mode for all new applications and operating systems.
Real-address mode. This operating mode provides the programming environment of the
Intel 8086 processor, with a few extensions (such as the ability to switch to protected or
system management mode).
System management mode (SMM). The system management mode (SMM) is a standard
architectural feature in all Intel Architecture processors, beginning with the Intel386T SL
processor. This mode provides an operating system or executive with a transparent
mechanism for implementing power management and OEM differentiation features. SMM
is entered through activation of an external system interrupt pin (SMI#), which generates a
system management interrupt (SMI). In SMM, the processor switches to a separate address
space while saving the context of the currently running program or task. SMM-specific
code may then be executed transparently. Upon returning from SMM, the processor is
placed back into its state prior to the SMI.
Virtual-8086 mode. In protected mode, the processor supports a quasi-operating mode
known as virtual-8086 mode. This mode allows the processor to execute 8086 software in
a protected, multitasking environment.
Figure 2-2 shows how the processor moves among these operating modes.
Figure 2-2
The processor is placed in real-address mode following power-up or a reset. Thereafter, the PE
flag in control register CR0 controls whether the processor is operating in real-address or
protected mode (refer to Section 2.5., "Control Registers"). Refer to Section 8.8., "Mode
Switching" in Chapter 8, Processor Management and Initialization for detailed information on
switching between real-address mode and protected mode.
The VM flag in the EFLAGS register determines whether the processor is operating in protected
mode or virtual-8086 mode. Transitions between protected mode and virtual-8086 mode are
generally carried out as part of a task switch or a return from an interrupt or exception handler
(refer to Section 16.2.5., "Entering Virtual-8086 Mode" in Chapter 16, 8086 Emulation).
The processor switches to SMM whenever it receives an SMI while the processor is in realaddress,
protected, or virtual-8086 modes. Upon execution of the RSM instruction, the
processor always returns to the mode it was in when the SMI occurred.
2.3. SYSTEM FLAGS AND FIELDS IN THE EFLAGS REGISTER
The system flags and IOPL field of the EFLAGS register control I/O, maskable hardware interrupts,
debugging, task switching, and the virtual-8086 mode (refer to Figure 2-3). Only privileged
code (typically operating system or executive code) should be allowed to modify these
bits.
The functions of the system flags and IOPL are as follows:
TF Trap (bit 8). Set to enable single-step mode for debugging; clear to disable single-step
mode. In single-step mode, the processor generates a debug exception after each
instruction, which allows the execution state of a program to be inspected after each
instruction. If an application program sets the TF flag using a POPF, POPFD, or IRET
instruction, a debug exception is generated after the instruction that follows the POPF,
POPFD, or IRET instruction.
IF Interrupt enable (bit 9). Controls the response of the processor to maskable hardware
interrupt requests (refer to Section 5.1.1.2., "Maskable Hardware Interrupts" in
Chapter 5, Interrupt and Exception Handling). Set to respond to maskable hardware
interrupts; cleared to inhibit maskable hardware interrupts. The IF flag does not affect
the generation of exceptions or nonmaskable interrupts (NMI interrupts). The CPL,
IOPL, and the state of the VME flag in control register CR4 determine whether the IF
flag can be modified by the CLI, STI, POPF, POPFD, and IRET instructions.
IOPL I/O privilege level field (bits 12 and 13). Indicates the I/O privilege level (IOPL) of
the currently running program or task. The CPL of the currently running program or
task must be less than or equal to the IOPL to access the I/O address space. This field
can only be modified by the POPF and IRET instructions when operating at a CPL of
0. Refer to Chapter 10, Input/Output, of the Intel Architecture Software Developer's
Manual, Volume 1, for more information on the relationship of the IOPL to I/O operations.
The IOPL is also one of the mechanisms that controls the modification of the IF flag
and the handling of interrupts in virtual-8086 mode when the virtual mode extensions
are in effect (the VME flag in control register CR4 is set).
NT Nested task (bit 14). Controls the chaining of interrupted and called tasks. The
processor sets this flag on calls to a task initiated with a CALL instruction, an interrupt,
or an exception. It examines and modifies this flag on returns from a task initiated with
the IRET instruction. The flag can be explicitly set or cleared with the POPF/POPFD
instructions; however, changing to the state of this flag can generate unexpected exceptions
in application programs. Refer to Section 6.4., "Task Linking" in Chapter 6, Task
Management for more information on nested tasks.
RF Resume (bit 16). Controls the processor's response to instruction-breakpoint conditions.
When set, this flag temporarily disables debug exceptions (#DE) from being
generated for instruction breakpoints; although, other exception conditions can
cause an exception to be generated. When clear, instruction breakpoints will generate
debug exceptions.
The primary function of the RF flag is to allow the restarting of an instruction following
a debug exception that was caused by an instruction breakpoint condition. Here,
debugger software must set this flag in the EFLAGS image on the stack just prior to
returning to the interrupted program with the IRETD instruction, to prevent the instruction
breakpoint from causing another debug exception. The processor then automatically
clears this flag after the instruction returned to has been successfully executed,
enabling instruction breakpoint faults again.
Refer to Section 15.3.1.1., "Instruction-Breakpoint Exception Condition", in Chapter
15, Debugging and Performance Monitoring, for more information on the use of this
flag.
VM Virtual-8086 mode (bit 17). Set to enable virtual-8086 mode; clear to return to
protected mode. Refer to Section 16.2.1., "Enabling Virtual-8086 Mode" in Chapter
16, 8086 Emulation for a detailed description of the use of this flag to switch to virtual-
8086 mode.
AC Alignment check (bit 18). Set this flag and the AM flag in the CR0 register to enable
alignment checking of memory references; clear the AC flag and/or the AM flag to
disable alignment checking. An alignment-check exception is generated when reference
is made to an unaligned operand, such as a word at an odd byte address or a
doubleword at an address which is not an integral multiple of four. Alignment-check
exceptions are generated only in user mode (privilege level 3). Memory references that
default to privilege level 0, such as segment descriptor loads, do not generate this
exception even when caused by instructions executed in user-mode.
The alignment-check exception can be used to check alignment of data. This is useful
when exchanging data with other processors, which require all data to be aligned. The
alignment-check exception can also be used by interpreters to flag some pointers as
special by misaligning the pointer. This eliminates overhead of checking each pointer
and only handles the special pointer when used.
VIF Virtual Interrupt (bit 19). Contains a virtual image of the IF flag. This flag is used in
conjunction with the VIP flag. The processor only recognizes the VIF flag when either
the VME flag or the PVI flag in control register CR4 is set and the IOPL is less than 3.
(The VME flag enables the virtual-8086 mode extensions; the PVI flag enables the
protected-mode virtual interrupts.) Refer to Section 16.3.3.5., "Method 6: Software
Interrupt Handling" and Section 16.4., "Protected-Mode Virtual Interrupts" in Chapter
16, 8086 Emulation for detailed information about the use of this flag.
VIP Virtual interrupt pending (bit 20). Set by software to indicate that an interrupt is
pending; cleared to indicate that no interrupt is pending. This flag is used in conjunction
with the VIF flag. The processor reads this flag but never modifies it. The
processor only recognizes the VIP flag when either the VME flag or the PVI flag in
control register CR4 is set and the IOPL is less than 3. (The VME flag enables the
virtual-8086 mode extensions; the PVI flag enables the protected-mode virtual interrupts.)
Refer to Section 16.3.3.5., "Method 6: Software Interrupt Handling" and
Section 16.4., "Protected-Mode Virtual Interrupts" in Chapter 16, 8086 Emulation for
detailed information about the use of this flag.
ID Identification (bit 21). The ability of a program or procedure to set or clear this flag
indicates support for the CPUID instruction.
2.4. MEMORY-MANAGEMENT REGISTERS
The processor provides four memory-management registers (GDTR, LDTR, IDTR, and TR)
that specify the locations of the data structures which control segmented memory management
(refer to Figure 2-4). Special instructions are provided for loading and storing these registers.
2.4.1. Global Descriptor Table Register (GDTR)
The GDTR register holds the 32-bit base address and 16-bit table limit for the GDT. The base
address specifies the linear address of byte 0 of the GDT; the table limit specifies the number of
bytes in the table. The LGDT and SGDT instructions load and store the GDTR register, respectively.
On power up or reset of the processor, the base address is set to the default value of 0 and
the limit is set to FFFFH. A new base address must be loaded into the GDTR as part of the
processor initialization process for protected-mode operation. Refer to Section 3.5.1., "Segment
Descriptor Tables" in Chapter 3, Protected-Mode Memory Management for more information
on the base address and limit fields.
2.4.2. Local Descriptor Table Register (LDTR)
The LDTR register holds the 16-bit segment selector, 32-bit base address, 16-bit segment limit,
and descriptor attributes for the LDT. The base address specifies the linear address of byte 0 of
the LDT segment; the segment limit specifies the number of bytes in the segment. Refer to
Section 3.5.1., "Segment Descriptor Tables" in Chapter 3, Protected-Mode Memory Management
for more information on the base address and limit fields.
The LLDT and SLDT instructions load and store the segment selector part of the LDTR register,
respectively. The segment that contains the LDT must have a segment descriptor in the GDT.
When the LLDT instruction loads a segment selector in the LDTR, the base address, limit, and
descriptor attributes from the LDT descriptor are automatically loaded into the LDTR.
When a task switch occurs, the LDTR is automatically loaded with the segment selector and
descriptor for the LDT for the new task. The contents of the LDTR are not automatically saved
prior to writing the new LDT information into the register.
On power up or reset of the processor, the segment selector and base address are set to the default
value of 0 and the limit is set to FFFFH.
2.4.3. IDTR Interrupt Descriptor Table Register
The IDTR register holds the 32-bit base address and 16-bit table limit for the IDT. The base
address specifies the linear address of byte 0 of the IDT; the table limit specifies the number of
bytes in the table. The LIDT and SIDT instructions load and store the IDTR register, respectively.
On power up or reset of the processor, the base address is set to the default value of 0 and
the limit is set to FFFFH. The base address and limit in the register can then be changed as part
of the processor initialization process. Refer to Section 5.8., "Interrupt Descriptor Table (IDT)"
in Chapter 5, Interrupt and Exception Handling for more information on the base address and
limit fields.
2.4.4. Task Register (TR)
The task register holds the 16-bit segment selector, 32-bit base address, 16-bit segment limit,
and descriptor attributes for the TSS of the current task. It references a TSS descriptor in the
GDT. The base address specifies the linear address of byte 0 of the TSS; the segment limit specifies
the number of bytes in the TSS. (Refer to Section 6.2.3., "Task Register" in Chapter 6, Task
Management for more information about the task register.)
The LTR and STR instructions load and store the segment selector part of the task register,
respectively. When the LTR instruction loads a segment selector in the task register, the base
address, limit, and descriptor attributes from the TSS descriptor are automatically loaded into
the task register. On power up or reset of the processor, the base address is set to the default value
of 0 and the limit is set to FFFFH.
When a task switch occurs, the task register is automatically loaded with the segment selector
and descriptor for the TSS for the new task. The contents of the task register are not automatically
saved prior to writing the new TSS information into the register.
2.5. CONTROL REGISTERS
The control registers (CR0, CR1, CR2, CR3, and CR4) determine operating mode of the
processor and the characteristics of the currently executing task (refer to Figure 2-5).
Figure 2-5. Control Registers
|
The control registers:
CR0-Contains system control flags that control operating mode and states of the
processor.
CR1-Reserved.
CR2-Contains the page-fault linear address (the linear address that caused a page fault).
CR3-Contains the physical address of the base of the page directory and two flags (PCD
and PWT). This register is also known as the page-directory base register (PDBR). Only
the 20 most-significant bits of the page-directory base address are specified; the lower 12
bits of the address are assumed to be 0. The page directory must thus be aligned to a page
(4-KByte) boundary. The PCD and PWT flags control caching of the page directory in the
processor's internal data caches (they do not control TLB caching of page-directory
information).
When using the physical address extension, the CR3 register contains the base address of
the page-directory-pointer table (refer to Section 3.8., "Physical Address Extension" in
Chapter 3, Protected-Mode Memory Management).
CR4-Contains a group of flags that enable several architectural extensions, as well as
indicating the level of OS support for the Streaming SIMD Extensions.
In protected mode, the move-to-or-from-control-registers forms of the MOV instruction allow
the control registers to be read (at privilege level 0 only) or loaded (at privilege level 0 only).
These restrictions mean that application programs (running at privilege levels 1, 2, or 3) are
prevented from reading or loading the control registers.
A program running at privilege level 1, 2, or 3 should not attempt to read or write the control
registers. An attempt to read or write these registers will result in a general protection fault
(GP(0)). The functions of the flags in the control registers are as follows:
PG Paging (bit 31 of CR0). Enables paging when set; disables paging when clear. When
paging is disabled, all linear addresses are treated as physical addresses. The PG flag
has no effect if the PE flag (bit 0 of register CR0) is not also set; in fact, setting the PG
flag when the PE flag is clear causes a general-protection exception (#GP) to be generated.
Refer to Section 3.6., "Paging (Virtual Memory)" in Chapter 3, Protected-Mode
Memory Management for a detailed description of the processor's paging mechanism.
CD Cache Disable (bit 30 of CR0). When the CD and NW flags are clear, caching of
memory locations for the whole of physical memory in the processor's internal (and
external) caches is enabled. When the CD flag is set, caching is restricted as described
in Table 9-4, in Chapter 9, Memory Cache Control. To prevent the processor from
accessing and updating its caches, the CD flag must be set and the caches must be
invalidated so that no cache hits can occur (refer to Section 9.5.2., "Preventing
Caching", in Chapter 9, Memory Cache Control). Refer to Section 9.5., "Cache
Control", Chapter 9, Memory Cache Control, for a detailed description of the additional
restrictions that can be placed on the caching of selected pages or regions of
memory.
NW Not Write-through (bit 29 of CR0). When the NW and CD flags are clear, write-back
(for PentiumR and P6 family processors) or write-through (for Intel486T processors)
is enabled for writes that hit the cache and invalidation cycles are enabled. Refer to
Table 9-4, in Chapter 9, Memory Cache Control, for detailed information about the
affect of the NW flag on caching for other settings of the CD and NW flags.
AM Alignment Mask (bit 18 of CR0). Enables automatic alignment checking when set;
disables alignment checking when clear. Alignment checking is performed only when
the AM flag is set, the AC flag in the EFLAGS register is set, the CPL is 3, and the
processor is operating in either protected or virtual-8086 mode.
WP Write Protect (bit 16 of CR0). Inhibits supervisor-level procedures from writing into
user-level read-only pages when set; allows supervisor-level procedures to write into
user-level read-only pages when clear. This flag facilitates implementation of the copyon-
write method of creating a new process (forking) used by operating systems such as
UNIX*.
NE Numeric Error (bit 5 of CR0). Enables the native (internal) mechanism for reporting
FPU errors when set; enables the PC-style FPU error reporting mechanism when clear.
When the NE flag is clear and the IGNNE# input is asserted, FPU errors are ignored.
When the NE flag is clear and the IGNNE# input is deasserted, an unmasked FPU error
causes the processor to assert the FERR# pin to generate an external interrupt and to
stop instruction execution immediately before executing the next waiting floatingpoint
instruction or WAIT/FWAIT instruction. The FERR# pin is intended to drive an
input to an external interrupt controller (the FERR# pin emulates the ERROR# pin of
the Intel 287 and Intel 387 DX math coprocessors). The NE flag, IGNNE# pin, and
FERR# pin are used with external logic to implement PC-style error reporting. (Refer
to "Software Exception Handling" in Chapter 7, and Appendix D in the Intel Architecture
Software Developer's Manual, Volume 1, for more information about FPU error
reporting and for detailed information on when the FERR# pin is asserted, which is
implementation dependent.)
ET Extension Type (bit 4 of CR0). Reserved in the P6 family and PentiumR processors.
(In the P6 family processors, this flag is hardcoded to 1.) In the Intel386T and
Intel486T processors, this flag indicates support of Intel 387 DX math coprocessor
instructions when set.
TS Task Switched (bit 3 of CR0). Allows the saving of FPU context on a task switch to
be delayed until the FPU is actually accessed by the new task. The processor sets this
flag on every task switch and tests it when interpreting floating-point arithmetic
instructions.
If the TS flag is set, a device-not-available exception (#NM) is raised prior to the
execution of a floating-point instruction.
If the TS flag and the MP flag (also in the CR0 register) are both set, an #NM
exception is raised prior to the execution of floating-point instruction or a
WAIT/FWAIT instruction.
CR0 Flags |
CR4 |
CPUID |
|
Instruction Type |
|
EM |
MP |
TS |
OSFXSR |
XMM |
Floating-Point |
WAIT/FWAIT |
MMX™ Technology |
Streaming SIMD Extensions |
0 |
0 |
0 |
- |
- |
Execute |
Execute |
Execute |
- |
0 |
0 |
1 |
- |
- |
#NM Exception |
Execute |
#NM Exception |
- |
0 |
1 |
0 |
- |
- |
Execute |
Execute |
Execute |
- |
0 |
1 |
1 |
- |
- |
#NM Exception |
#NM Exception |
#NM Exception |
- |
1 |
0 |
0 |
- |
- |
#NM Exception |
Execute |
#UD Exception |
- |
1 |
0 |
1 |
- |
- |
#NM Exception |
Execute |
#UD Exception |
- |
1 |
1 |
0 |
- |
- |
#NM Exception |
Execute |
#UD Exception |
- |
EM |
MP |
TS |
OSFXSR |
XMM |
Floating-Point |
WAIT/FWAIT |
MMX™ Technology |
Streaming SIMD Extensions |
1 |
1 |
1 |
- |
- |
#NM Exception |
#NM Exception |
#UD Exception |
- |
1 |
- |
- |
- |
- |
- |
- |
- |
#UD Interrupt 6 |
0 |
- |
1 |
1 |
1 |
- |
- |
- |
#NM Interrupt 7 |
- |
- |
- |
0 |
- |
- |
- |
- |
#UD Interrupt 6 |
- |
- |
- |
- |
0 |
- |
- |
- |
#UD Interrupt 6 |
The processor does not automatically save the context of the FPU on a task switch.
Instead it sets the TS flag, which causes the processor to raise an #NM exception whenever
it encounters a floating-point instruction in the instruction stream for the new task.
The fault handler for the #NM exception can then be used to clear the TS flag (with the
CLTS instruction) and save the context of the FPU. If the task never encounters a
floating-point instruction, the FPU context is never saved.
EM Emulation (bit 2 of CR0). Indicates that the processor does not have an internal or
external FPU when set; indicates an FPU is present when clear. When the EM flag is
set, execution of a floating-point instruction generates a device-not-available exception
(#NM). This flag must be set when the processor does not have an internal FPU or is
not connected to a math coprocessor. If the processor does have an internal FPU,
setting this flag would force all floating-point instructions to be handled by software
emulation. Table 8-2 in Chapter 8, Processor Management and Initialization shows the
recommended setting of this flag, depending on the Intel Architecture processor and
FPU or math coprocessor present in the system. Table 2-1 shows the interaction of the
EM, MP, and TS flags.
Note that the EM flag also affects the execution of the MMXT instructions (refer to
Table 2-1). When this flag is set, execution of an MMXT instruction causes an invalid
opcode exception (#UD) to be generated. Thus, if an Intel Architecture processor
incorporates MMXT technology, the EM flag must be set to 0 to enable execution of
MMXT instructions.
Similarly for the Streaming SIMD Extensions, when this flag is set, execution of a Streaming
SIMD Extensions instruction causes an invalid opcode exception (#UD) to be generated. Thus,
if an Intel Architecture processor incorporates Streaming SIMD Extensions, the EM flag must
be set to 0 to enable execution of Streaming SIMD Extensions. The exception to this is the
PREFETCH and SFENCE instructions. These instructions are not affected by the EM flag.
MP Monitor Coprocessor (bit 1 of CR0). Controls the interaction of the WAIT (or
FWAIT) instruction with the TS flag (bit 3 of CR0). If the MP flag is set, a WAIT
instruction generates a device-not-available exception (#NM) if the TS flag is set. If the
MP flag is clear, the WAIT instruction ignores the setting of the TS flag. Table 8-2 in
Chapter 8, Processor Management and Initialization shows the recommended setting
of this flag, depending on the Intel Architecture processor and FPU or math coprocessor
present in the system. Table 2-1 shows the interaction of the MP, EM, and TS
flags.
PE Protection Enable (bit 0 of CR0). Enables protected mode when set; enables realaddress
mode when clear. This flag does not enable paging directly. It only enables
segment-level protection. To enable paging, both the PE and PG flags must be set.
Refer to Section 8.8., "Mode Switching" in Chapter 8, Processor Management and
Initialization for information using the PE flag to switch between real and protected
mode.
PCD Page-level Cache Disable (bit 4 of CR3). Controls caching of the current page directory.
When the PCD flag is set, caching of the page-directory is prevented; when the
flag is clear, the page-directory can be cached. This flag affects only the processor's
internal caches (both L1 and L2, when present). The processor ignores this flag if
paging is not used (the PG flag in register CR0 is clear) or the CD (cache disable) flag
in CR0 is set. Refer to Chapter 9, Memory Cache Control, for more information about
the use of this flag. Refer to Section 3.6.4., "Page-Directory and Page-Table Entries"
in Chapter 3, Protected-Mode Memory Management for a description of a companion
PCD flag in the page-directory and page-table entries.
PWT Page-level Writes Transparent (bit 3 of CR3). Controls the write-through or writeback
caching policy of the current page directory. When the PWT flag is set, writethrough
caching is enabled; when the flag is clear, write-back caching is enabled. This
flag affects only the internal caches (both L1 and L2, when present). The processor
ignores this flag if paging is not used (the PG flag in register CR0 is clear) or the CD
(cache disable) flag in CR0 is set. Refer to Section 9.5., "Cache Control", in Chapter
9, Memory Cache Control, for more information about the use of this flag. Refer to
Section 3.6.4., "Page-Directory and Page-Table Entries" in Chapter 3, Protected-Mode
Memory Management for a description of a companion PCD flag in the page-directory
and page-table entries.
VME Virtual-8086 Mode Extensions (bit 0 of CR4). Enables interrupt- and exceptionhandling
extensions in virtual-8086 mode when set; disables the extensions when clear.
Use of the virtual mode extensions can improve the performance of virtual-8086 applications
by eliminating the overhead of calling the virtual-8086 monitor to handle interrupts
and exceptions that occur while executing an 8086 program and, instead,
redirecting the interrupts and exceptions back to the 8086 program's handlers. It also
provides hardware support for a virtual interrupt flag (VIF) to improve reliability of
running 8086 programs in multitasking and multiple-processor environments. Refer to
Section 16.3., "Interrupt and Exception Handling in Virtual-8086 Mode" in Chapter 16,
8086 Emulation for detailed information about the use of this feature.
PVI Protected-Mode Virtual Interrupts (bit 1 of CR4). Enables hardware support for a
virtual interrupt flag (VIF) in protected mode when set; disables the VIF flag in
protected mode when clear. Refer to Section 16.4., "Protected-Mode Virtual Interrupts"
in Chapter 16, 8086 Emulation for detailed information about the use of this
feature.
TSD Time Stamp Disable (bit 2 of CR4). Restricts the execution of the RDTSC instruction
to procedures running at privilege level 0 when set; allows RDTSC instruction to be
executed at any privilege level when clear.
DE Debugging Extensions (bit 3 of CR4). References to debug registers DR4 and DR5
cause an undefined opcode (#UD) exception to be generated when set; when clear,
processor aliases references to registers DR4 and DR5 for compatibility with software
written to run on earlier Intel Architecture processors. Refer to Section 15.2.2., "Debug
Registers DR4 and DR5", in Chapter 15, Debugging and Performance Monitoring, for
more information on the function of this flag.
PSE Page Size Extensions (bit 4 of CR4). Enables 4-MByte pages when set; restricts pages
to 4 KBytes when clear. Refer to Section 3.6.1., "Paging Options" in Chapter 3,
Protected-Mode Memory Management for more information about the use of this flag.
PAE Physical Address Extension (bit 5 of CR4). Enables paging mechanism to reference
36-bit physical addresses when set; restricts physical addresses to 32 bits when clear.
Refer to Section 3.8., "Physical Address Extension" in Chapter 3, Protected-Mode
Memory Management for more information about the physical address extension.
MCE Machine-Check Enable (bit 6 of CR4). Enables the machine-check exception when
set; disables the machine-check exception when clear. Refer to Chapter 13, Machine-
Check Architecture, for more information about the machine-check exception and
machine- check architecture.
PGE Page Global Enable (bit 7 of CR4). (Introduced in the P6 family processors.) Enables
the global page feature when set; disables the global page feature when clear. The
global page feature allows frequently used or shared pages to be marked as global to
all users (done with the global flag, bit 8, in a page-directory or page-table entry).
Global pages are not flushed from the translation-lookaside buffer (TLB) on a task
switch or a write to register CR3. In addition, the bit must not be enabled before paging
is enabled via CR0.PG. Program correctness may be affected by reversing this
sequence, and processor performance will be impacted. Refer to Section 3.7., "Translation
Lookaside Buffers (TLBs)" in Chapter 3, Protected-Mode Memory Management
for more information on the use of this bit.
PCE Performance-Monitoring Counter Enable (bit 8 of CR4). Enables execution of the
RDPMC instruction for programs or procedures running at any protection level when
set; RDPMC instruction can be executed only at protection level 0 when clear.
OSFXSR
Operating Sytsem FXSAVE/FXRSTOR Support (bit 9 of CR4). The operating
system will set this bit if both the CPU and the OS support the use of
FXSAVE/FXRSTOR for use during context switches.
OSXMMEXCPT
Operating System Unmasked Exception Support (bit 10 of CR4). The operating
system will set this bit if it provides support for unmasked SIMD floating-point exceptions.
2.5.1. CPUID Qualification of Control Register Flags
The VME, PVI, TSD, DE, PSE, PAE, MCE, PGE, PCE, OSFXSR, and OSXMMCEPT flags in
control register CR4 are model specific. All of these flags (except PCE) can be qualified with
the CPUID instruction to determine if they are implemented on the processor before they are
used.
2.6. SYSTEM INSTRUCTION SUMMARY
The system instructions handle system-level functions such as loading system registers,
managing the cache, managing interrupts, or setting up the debug registers. Many of these
instructions can be executed only by operating-system or executive procedures (that is, procedures
running at privilege level 0). Others can be executed at any privilege level and are thus
available to application programs. Table 2-2 lists the system instructions and indicates whether
they are available and useful for application programs. These instructions are described in detail
in Chapter 3, Instruction Set Reference, of the Intel Architecture Software Developer's Manual,
Volume 2.
NOTES:
1. Useful to application programs running at a CPL of 1 or 2.
2. The TSD and PCE flags in control register CR4 control access to these instructions by application
programs running at a CPL of 3.
3. These instructions were introduced into the Intel Architecture with the PentiumR processor.
4. This instruction was introduced into the Intel Architecture with the PentiumR Pro processor and the Pentium
processor with MMXT technology.
5. This instruction was introduced into the Intel Architecture with the PentiumR III processor.
2.6.1. Loading and Storing System Registers
The GDTR, LDTR, IDTR, and TR registers each have a load and store instruction for loading
data into and storing data from the register:
LGDT (Load GDTR Register) Loads the GDT base address and limit from memory into the
GDTR register.
SGDT (Store GDTR Register) Stores the GDT base address and limit from the GDTR register
into memory.
LIDT (Load IDTR Register) Loads the IDT base address and limit from memory into the
IDTR register.
SIDT (Load IDTR Register Stores the IDT base address and limit from the IDTR register
into memory.
LLDT (Load LDT Register) Loads the LDT segment selector and segment descriptor from
memory into the LDTR. (The segment selector operand can also
be located in a general-purpose register.)
SLDT (Store LDT Register) Stores the LDT segment selector from the LDTR register into
memory or a general-purpose register.
LTR (Load Task Register) Loads segment selector and segment descriptor for a TSS from
memory into the task register. (The segment selector operand
can also be located in a general-purpose register.)
STR (Store Task Register) Stores the segment selector for the current task TSS from the
task register into memory or a general-purpose register.
The LMSW (load machine status word) and SMSW (store machine status word) instructions
operate on bits 0 through 15 of control register CR0. These instructions are provided for compatibility
with the 16-bit Intel 286 processor. Program written to run on 32-bit Intel Architecture
processors should not use these instructions. Instead, they should access the control register CR0
using the MOV instruction.
The CLTS (clear TS flag in CR0) instruction is provided for use in handling a device-not-available
exception (#NM) that occurs when the processor attempts to execute a floating-point
instruction when the TS flag is set. This instruction allows the TS flag to be cleared after the
FPU context has been saved, preventing further #NM exceptions. Refer to Section 2.5., "Control
Registers" for more information about the TS flag.
The control registers (CR0, CR1, CR2, CR3, and CR4) are loaded with the MOV instruction.
This instruction can load a control register from a general-purpose register or store the contents
of the control register in a general-purpose register.
2.6.2. Verifying of Access Privileges
The processor provides several instructions for examining segment selectors and segment
descriptors to determine if access to their associated segments is allowed. These instructions
duplicate some of the automatic access rights and type checking done by the processor, thus
allowing operating-system or executive software to prevent exceptions from being generated.
The ARPL (adjust RPL) instruction adjusts the RPL (requestor privilege level) of a segment
selector to match that of the program or procedure that supplied the segment selector. Refer to
Section 4.10.4., "Checking Caller Access Privileges (ARPL Instruction)" in Chapter 4, Protection
for a detailed explanation of the function and use of this instruction.
The LAR (load access rights) instruction verifies the accessibility of a specified segment and
loads the access rights information from the segment's segment descriptor into a generalpurpose
register. Software can then examine the access rights to determine if the segment type
is compatible with its intended use. Refer to Section 4.10.1., "Checking Access Rights (LAR
Instruction)" in Chapter 4, Protection for a detailed explanation of the function and use of this
instruction.
The LSL (load segment limit) instruction verifies the accessibility of a specified segment and
loads the segment limit from the segment's segment descriptor into a general-purpose register.
Software can then compare the segment limit with an offset into the segment to determine
whether the offset lies within the segment. Refer to Section 4.10.3., "Checking That the Pointer
Offset Is Within Limits (LSL Instruction)" in Chapter 4, Protection for a detailed explanation of
the function and use of this instruction.
The VERR (verify for reading) and VERW (verify for writing) instructions verify if a selected
segment is readable or writable, respectively, at the CPL. Refer to Section 4.10.2., "Checking
Read/Write Rights (VERR and VERW Instructions)" in Chapter 4, Protection for a detailed
explanation of the function and use of this instruction.
2.6.3. Loading and Storing Debug Registers
The internal debugging facilities in the processor are controlled by a set of 8 debug registers
(DR0 through DR7). The MOV instruction allows setup data to be loaded into and stored from
these registers.
2.6.4. Invalidating Caches and TLBs
The processor provides several instructions for use in explicitly invalidating its caches and TLB
entries. The INVD (invalidate cache with no writeback) instruction invalidates all data and
instruction entries in the internal caches and TLBs and sends a signal to the external caches indicating
that they should be invalidated also.
The WBINVD (invalidate cache with writeback) instruction performs the same function as the
INVD instruction, except that it writes back any modified lines in its internal caches to memory
before it invalidates the caches. After invalidating the internal caches, it signals the external
caches to write back modified data and invalidate their contents.
The INVLPG (invalidate TLB entry) instruction invalidates (flushes) the TLB entry for a specified
page.
2.6.5. Controlling the Processor
The HLT (halt processor) instruction stops the processor until an enabled interrupt (such as NMI
or SMI, which are normally enabled), the BINIT# signal, the INIT# signal, or the RESET#
signal is received. The processor generates a special bus cycle to indicate that the halt mode has
been entered. Hardware may respond to this signal in a number of ways. An indicator light on
the front panel may be turned on. An NMI interrupt for recording diagnostic information may
be generated. Reset initialization may be invoked. (Note that the BINIT# pin was introduced
with the PentiumR Pro processor.)
The LOCK prefix invokes a locked (atomic) read-modify-write operation when modifying a
memory operand. This mechanism is used to allow reliable communications between processors
in multiprocessor systems. In the PentiumR and earlier Intel Architecture processors, the LOCK
prefix causes the processor to assert the LOCK# signal during the instruction, which always
causes an explicit bus lock to occur. In the P6 family processors, the locking operation is handled
with either a cache lock or bus lock. If a memory access is cacheable and affects only a single
cache line, a cache lock is invoked and the system bus and the actual memory location in system
memory are not locked during the operation. Here, other P6 family processors on the bus writeback
any modified data and invalidate their caches as necessary to maintain system memory
coherency. If the memory access is not cacheable and/or it crosses a cache line boundary, the
processor's LOCK# signal is asserted and the processor does not respond to requests for bus
control during the locked operation.
The RSM (return from SMM) instruction restores the processor (from a context dump) to the
state it was in prior to an system management mode (SMM) interrupt.
2.6.6. Reading Performance-Monitoring and Time-Stamp
Counters
The RDPMC (read performance-monitoring counter) and RDTSC (read time-stamp counter)
instructions allow an application program to read the processors performance-monitoring and
time-stamp counters, respectively.
The P6 family processors have two 40-bit performance counters that record either the occurrence
of events or the duration of events. The events that can be monitored include the number
of instructions decoded, number of interrupts received, of number of cache loads. Each counter
can be set up to monitor a different event, using the system instruction WRMSR to set up values
in the model-specific registers PerfEvtSel0 and PerfEvtSel1. The RDPMC instruction loads the
current count in counter 0 or 1 into the EDX:EAX registers.
The time-stamp counter is a model-specific 64-bit counter that is reset to zero each time the
processor is reset. If not reset, the counter will increment ~6.3 x 1015 times per year when
the processor is operating at a clock rate of 200 MHz. At this clock frequency, it would take
over 2000 years for the counter to wrap around. The RDTSC instruction loads the current
count of the time-stamp counter into the EDX:EAX registers.
Refer to Section 15.5., "Time-Stamp Counter", and Section 15.6., "Performance-Monitoring
Counters", in Chapter 15, Debugging and Performance Monitoring, for more information about
the performance monitoring and time-stamp counters.
The RDTSC instruction was introduced into the Intel Architecture with the PentiumR processor.
The RDPMC instruction was introduced into the Intel Architecture with the PentiumR Pro
processor and the PentiumR processor with MMXT technology. Earlier PentiumR processors
have two performance-monitoring counters, but they can be read only with the RDMSR instruction,
and only at privilege level 0.
2.6.7. Reading and Writing Model-Specific Registers
The RDMSR (read model-specific register) and WRMSR (write model-specific register) allow
the processor's 64-bit model-specific registers (MSRs) to be read and written to, respectively.
The MSR to be read or written to is specified by the value in the ECX register. The RDMSR
instruction reads the value from the specified MSR into the EDX:EAX registers; the WRMSR
writes the value in the EDX:EAX registers into the specified MSR. Refer to Section 8.4.,
"Model-Specific Registers (MSRs)" in Chapter 8, Processor Management and Initialization for
more information about the MSRs.
The RDMSR and WRMSR instructions were introduced into the Intel Architecture with the
PentiumR processor.
2.6.8. Loading and Storing the Streaming SIMD Extensions
Control/Status Word
The LDMXCSR (load Streaming SIMD Extensions control/status word from memory) and
STMXCSR (store Streaming SIMD Extensions control/status word to memory) allow the
PentiumR III processor's 32-bit control/status word to be read and written to, respectively. The
MXCSR control/status register is used to enable masked/unmasked exception handling, to set
rounding modes, to set flush-to-zero mode, and to view exception status flags. For more information
on the LDMXCSR and STMXCSR instructions, refer to the Intel Architecture Software
Developer's Manual, Vol 2, for a complete description of these instructions.
CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT
This chapter describes the Intel Architecture's protected-mode memory management facilities,
including the physical memory requirements, the segmentation mechanism, and the paging
mechanism. Refer to Chapter 4, Protection for a description of the processor's protection mechanism.
Refer to Chapter 16, 8086 Emulation for a description of memory addressing protection
in real-address and virtual-8086 modes.
3.1. MEMORY MANAGEMENT OVERVIEW
The memory management facilities of the Intel Architecture are divided into two parts: segmentation
and paging. Segmentation provides a mechanism of isolating individual code, data, and
stack modules so that multiple programs (or tasks) can run on the same processor without interfering
with one another. Paging provides a mechanism for implementing a conventional
demand-paged, virtual-memory system where sections of a program's execution environment
are mapped into physical memory as needed. Paging can also be used to provide isolation
between multiple tasks. When operating in protected mode, some form of segmentation must be
used. There is no mode bit to disable segmentation. The use of paging, however, is optional.
These two mechanisms (segmentation and paging) can be configured to support simple singleprogram
(or single-task) systems, multitasking systems, or multiple-processor systems that used
shared memory.
As shown in Figure 3-1, segmentation provides a mechanism for dividing the processor's
addressable memory space (called the linear address space) into smaller protected address
spaces called segments. Segments can be used to hold the code, data, and stack for a program
or to hold system data structures (such as a TSS or LDT). If more than one program (or task) is
running on a processor, each program can be assigned its own set of segments. The processor
then enforces the boundaries between these segments and insures that one program does not
interfere with the execution of another program by writing into the other program's segments.
The segmentation mechanism also allows typing of segments so that the operations that may be
performed on a particular type of segment can be restricted.
All of the segments within a system are contained in the processor's linear address space. To
locate a byte in a particular segment, a logical address (sometimes called a far pointer) must be
provided. A logical address consists of a segment selector and an offset. The segment selector
is a unique identifier for a segment. Among other things it provides an offset into a descriptor
table (such as the global descriptor table, GDT) to a data structure called a segment descriptor.
Each segment has a segment descriptor, which specifies the size of the segment, the access rights
and privilege level for the segment, the segment type, and the location of the first byte of the
segment in the linear address space (called the base address of the segment). The offset part of
the logical address is added to the base address for the segment to locate a byte within the
segment. The base address plus the offset thus forms a linear address in the processor's linear
adress space.
Figure 3-1
If paging is not used, the linear address space of the processor is mapped directly into the physical
address space of processor. The physical address space is defined as the range of addresses
that the processor can generate on its address bus.
Because multitasking computing systems commonly define a linear address space much larger
than it is economically feasible to contain all at once in physical memory, some method of
"virtualizing" the linear address space is needed. This virtualization of the linear address space
is handled through the processor's paging mechanism.
Paging supports a "virtual memory" environment where a large linear address space is simulated
with a small amount of physical memory (RAM and ROM) and some disk storage. When using
paging, each segment is divided into pages (ordinarily 4 KBytes each in size), which are stored
either in physical memory or on the disk. The operating system or executive maintains a page
directory and a set of page tables to keep track of the pages. When a program (or task) attempts
to access an address location in the linear address space, the processor uses the page directory
and page tables to translate the linear address into a physical address and then performs the
requested operation (read or write) on the memory location. If the page being accessed is not
currently in physical memory, the processor interrupts execution of the program (by generating
a page-fault exception). The operating system or executive then reads the page into physical
memory from the disk and continues executing the program.
When paging is implemented properly in the operating-system or executive, the swapping of
pages between physical memory and the disk is transparent to the correct execution of a
program. Even programs written for 16-bit Intel Architecture processors can be paged (transparently)
when they are run in virtual-8086 mode.
3.2. USING SEGMENTS
The segmentation mechanism supported by the Intel Architecture can be used to implement a
wide variety of system designs. These designs range from flat models that make only minimal
use of segmentation to protect programs to multisegmented models that employ segmentation
to create a robust operating environment in which multiple programs and tasks can be executed
reliably.
The following sections give several examples of how segmentation can be employed in a system
to improve memory management performance and reliability.
3.2.1. Basic Flat Model
The simplest memory model for a system is the basic "flat model," in which the operating
system and application programs have access to a continuous, unsegmented address space. To
the greatest extent possible, this basic flat model hides the segmentation mechanism of the architecture
from both the system designer and the application programmer.
To implement a basic flat memory model with the Intel Architecture, at least two segment
descriptors must be created, one for referencing a code segment and one for referencing a data
segment (refer to Figure 3-2). Both of these segments, however, are mapped to the entire linear
address space: that is, both segment descriptors have the same base address value of 0 and the
same segment limit of 4 GBytes. By setting the segment limit to 4 GBytes, the segmentation
mechanism is kept from generating exceptions for out of limit memory references, even if no
physical memory resides at a particular address. ROM (EPROM) is generally located at the top
of the physical address space, because the processor begins execution at FFFF_FFF0H. RAM
(DRAM) is placed at the bottom of the address space because the initial base address for the DS
data segment after reset initialization is 0.
Figure 3-2
3.2.2. Protected Flat Model
The protected flat model is similar to the basic flat model, except the segment limits are set to
include only the range of addresses for which physical memory actually exists (refer to Figure
3-3). A general-protection exception (#GP) is then generated on any attempt to access nonexistent
memory. This model provides a minimum level of hardware protection against some kinds
of program bugs.
Figure 3-3
More complexity can be added to this protected flat model to provide more protection. For
example, for the paging mechanism to provide isolation between user and supervisor code and
data, four segments need to be defined: code and data segments at privilege level 3 for the user,
and code and data segments at privilege level 0 for the supervisor. Usually these segments all
overlay each other and start at address 0 in the linear address space. This flat segmentation
model along with a simple paging structure can protect the operating system from applications,
and by adding a separate paging structure for each task or process, it can also protect applications
from each other. Similar designs are used by several popular multitasking operating
systems.
3.2.3. Multisegment Model
A multisegment model (such as the one shown in Figure 3-4) uses the full capabilities of the
segmentation mechanism to provided hardware enforced protection of code, data structures, and
programs and tasks. Here, each program (or task) is given its own table of segment descriptors
and its own segments. The segments can be completely private to their assigned programs or
shared among programs. Access to all segments and to the execution environments of individual
programs running on the system is controlled by hardware.
Figure 3-4
Access checks can be used to protect not only against referencing an address outside the limit
of a segment, but also against performing disallowed operations in certain segments. For
example, since code segments are designated as read-only segments, hardware can be used to
prevent writes into code segments. The access rights information created for segments can also
be used to set up protection rings or levels. Protection levels can be used to protect operatingsystem
procedures from unauthorized access by application programs.
3.2.4. Paging and Segmentation
Paging can be used with any of the segmentation models described in Figures 3-2, 3-3, and 3-4.
The processor's paging mechanism divides the linear address space (into which segments are
mapped) into pages (as shown in Figure 3-1). These linear-address-space pages are then mapped
to pages in the physical address space. The paging mechanism offers several page-level protection
facilities that can be used with or instead of the segment-protection facilities. For example,
it lets read-write protection be enforced on a page-by-page basis. The paging mechanism also
provides two-level user-supervisor protection that can also be specified on a page-by-page basis.
3.3. PHYSICAL ADDRESS SPACE
In protected mode, the Intel Architecture provides a normal physical address space of 4 GBytes
(2^32 bytes). This is the address space that the processor can address on its address bus. This
address space is flat (unsegmented), with addresses ranging continuously from 0 to
FFFFFFFFH. This physical address space can be mapped to read-write memory, read-only
memory, and memory mapped I/O. The memory mapping facilities described in this chapter can
be used to divide this physical memory up into segments and/or pages.
(Introduced in the PentiumR Pro processor.) The Intel Architecture also supports an extension of
the physical address space to 236 bytes (64 GBytes), with a maximum physical address of
FFFFFFFFFH. This extension is invoked with the physical address extension (PAE) flag,
located in bit 5 of control register CR4. (Refer to Section 3.8., "Physical Address Extension" for
more information about extended physical addressing.)
3.4. LOGICAL AND LINEAR ADDRESSES
At the system-architecture level in protected mode, the processor uses two stages of address
translation to arrive at a physical address: logical-address translation and linear address space
paging.
Even with the minimum use of segments, every byte in the processor's address space is accessed
with a logical address. A logical address consists of a 16-bit segment selector and a 32-bit offset
(refer to Figure 3-5). The segment selector identifies the segment the byte is located in and the
offset specifies the location of the byte in the segment relative to the base address of the segment.
The processor translates every logical address into a linear address. A linear address is a 32-bit
address in the processor's linear address space. Like the physical address space, the linear
address space is a flat (unsegmented), 232-byte address space, with addresses ranging from 0 to
FFFFFFFH. The linear address space contains all the segments and system tables defined for a
system.
To translate a logical address into a linear address, the processor does the following:
1. Uses the offset in the segment selector to locate the segment descriptor for the segment in
the GDT or LDT and reads it into the processor. (This step is needed only when a new
segment selector is loaded into a segment register.)
2. Examines the segment descriptor to check the access rights and range of the segment to
insure that the segment is accessible and that the offset is within the limits of the segment.
3. Adds the base address of the segment from the segment descriptor to the offset to form a
linear address.
Figure 3-5
If paging is not used, the processor maps the linear address directly to a physical address (that
is, the linear address goes out on the processor's address bus). If the linear address space is
paged, a second level of address translation is used to translate the linear address into a physical
address. Page translation is described in Section 3.6., "Paging (Virtual Memory)"
3.4.1. Segment Selectors
A segment selector is a 16-bit identifier for a segment (refer to Figure 3-6). It does not point
directly to the segment, but instead points to the segment descriptor that defines the segment. A
segment selector contains the following items:
Index (Bits 3 through 15). Selects one of 8192 descriptors in the GDT or LDT. The
processor multiplies the index value by 8 (the number of bytes in a segment
descriptor) and adds the result to the base address of the GDT or LDT (from
the GDTR or LDTR register, respectively).
TI (table indicator) flag
(Bit 2). Specifies the descriptor table to use: clearing this flag selects the GDT;
setting this flag selects the current LDT.
Figure 3-6
Requested Privilege Level (RPL)
(Bits 0 and 1). Specifies the privilege level of the selector. The privilege level
can range from 0 to 3, with 0 being the most privileged level. Refer to Section
4.5., "Privilege Levels" in Chapter 4, Protection for a description of the relationship
of the RPL to the CPL of the executing program (or task) and the
descriptor privilege level (DPL) of the descriptor the segment selector points
to.
The first entry of the GDT is not used by the processor. A segment selector that points to this
entry of the GDT (that is, a segment selector with an index of 0 and the TI flag set to 0) is used
as a "null segment selector." The processor does not generate an exception when a segment
register (other than the CS or SS registers) is loaded with a null selector. It does, however,
generate an exception when a segment register holding a null selector is used to access memory.
A null selector can be used to initialize unused segment registers. Loading the CS or SS register
with a null segment selector causes a general-protection exception (#GP) to be generated.
Segment selectors are visible to application programs as part of a pointer variable, but the values
of selectors are usually assigned or modified by link editors or linking loaders, not application
programs.
3.4.2. Segment Registers
To reduce address translation time and coding complexity, the processor provides registers for
holding up to 6 segment selectors (refer to Figure 3-7). Each of these segment registers support
a specific kind of memory reference (code, stack, or data). For virtually any kind of program
execution to take place, at least the code-segment (CS), data-segment (DS), and stack-segment
(SS) registers must be loaded with valid segment selectors. The processor also provides three
additional data-segment registers (ES, FS, and GS), which can be used to make additional data
segments available to the currently executing program (or task).
For a program to access a segment, the segment selector for the segment must have been loaded
in one of the segment registers. So, although a system can define thousands of segments, only 6
can be available for immediate use. Other segments can be made available by loading their
segment selectors into these registers during program execution.
Every segment register has a "visible" part and a "hidden" part. (The hidden part is sometimes
referred to as a "descriptor cache" or a "shadow register.") When a segment selector is loaded
into the visible part of a segment register, the processor also loads the hidden part of the segment
register with the base address, segment limit, and access control information from the segment
descriptor pointed to by the segment selector. The information cached in the segment register
(visible and hidden) allows the processor to translate addresses without taking extra bus cycles
to read the base address and limit from the segment descriptor. In systems in which multiple
processors have access to the same descriptor tables, it is the responsibility of software to reload
the segment registers when the descriptor tables are modified. If this is not done, an old segment
descriptor cached in a segment register might be used after its memory-resident version has been
modified.
Two kinds of load instructions are provided for loading the segment registers:
1. Direct load instructions such as the MOV, POP, LDS, LES, LSS, LGS, and LFS instructions.
These instructions explicitly reference the segment registers.
2. Implied load instructions such as the far pointer versions of the CALL, JMP, and RET
instructions and the IRET, INTn, INTO and INT3 instructions. These instructions change
the contents of the CS register (and sometimes other segment registers) as an incidental
part of their operation.
The MOV instruction can also be used to store visible part of a segment register in a generalpurpose
register.
3.4.3. Segment Descriptors
A segment descriptor is a data structure in a GDT or LDT that provides the processor with the
size and location of a segment, as well as access control and status information. Segment
descriptors are typically created by compilers, linkers, loaders, or the operating system or exec-
utive, but not application programs. Figure 3-8 illustrates the general descriptor format for all
types of segment descriptors.
The flags and fields in a segment descriptor are as follows:
Segment limit field
Specifies the size of the segment. The processor puts together the two segment
limit fields to form a 20-bit value. The processor interprets the segment limit
in one of two ways, depending on the setting of the G (granularity) flag:
If the granularity flag is clear, the segment size can range from 1 byte to 1
MByte, in byte increments.
If the granularity flag is set, the segment size can range from 4 KBytes to
4 GBytes, in 4-KByte increments.
The processor uses the segment limit in two different ways, depending on
whether the segment is an expand-up or an expand-down segment. Refer to
Section 3.4.3.1., "Code- and Data-Segment Descriptor Types" for more information
about segment types. For expand-up segments, the offset in a logical
address can range from 0 to the segment limit. Offsets greater than the segment
limit generate general-protection exceptions (#GP). For expand-down
segments, the segment limit has the reverse function; the offset can range from
the segment limit to FFFFFFFFH or FFFFH, depending on the setting of the B
flag. Offsets less than the segment limit generate general-protection exceptions.
Decreasing the value in the segment limit field for an expand-down
segment allocates new memory at the bottom of the segment's address space,
rather than at the top. Intel Architecture stacks always grow downwards,
making this mechanism is convenient for expandable stacks.
Figure 3-8
Base address fields
Defines the location of byte 0 of the segment within the 4-GByte linear address
space. The processor puts together the three base address fields to form a single
32-bit value. Segment base addresses should be aligned to 16-byte boundaries.
Although 16-byte alignment is not required, this alignment allows programs to
maximize performance by aligning code and data on 16-byte boundaries.
Type field
Indicates the segment or gate type and specifies the kinds of access that can be
made to the segment and the direction of growth. The interpretation of this field
depends on whether the descriptor type flag specifies an application (code or
data) descriptor or a system descriptor. The encoding of the type field is
different for code, data, and system descriptors (refer to Figure 4-1 in Chapter
4, Protection). Refer to Section 3.4.3.1., "Code- and Data-Segment Descriptor
Types" for a description of how this field is used to specify code and datasegment
types.
S (descriptor type) flag
Specifies whether the segment descriptor is for a system segment (S flag is
clear) or a code or data segment (S flag is set).
DPL (descriptor privilege level) field
Specifies the privilege level of the segment. The privilege level can range from
0 to 3, with 0 being the most privileged level. The DPL is used to control access
to the segment. Refer to Section 4.5., "Privilege Levels" in Chapter 4, Protection
for a description of the relationship of the DPL to the CPL of the executing
code segment and the RPL of a segment selector.
P (segment-present) flag
Indicates whether the segment is present in memory (set) or not present (clear).
If this flag is clear, the processor generates a segment-not-present exception
(#NP) when a segment selector that points to the segment descriptor is loaded
into a segment register. Memory management software can use this flag to
control which segments are actually loaded into physical memory at a given
time. It offers a control in addition to paging for managing virtual memory.
Figure 3-9 shows the format of a segment descriptor when the segment-present
flag is clear. When this flag is clear, the operating system or executive is free
to use the locations marked "Available" to store its own data, such as information
regarding the whereabouts of the missing segment.
D/B (default operation size/default stack pointer size and/or upper bound) flag
Performs different functions depending on whether the segment descriptor is
an executable code segment, an expand-down data segment, or a stack
segment. (This flag should always be set to 1 for 32-bit code and data segments
and to 0 for 16-bit code and data segments.)
Executable code segment. The flag is called the D flag and it indicates the
default length for effective addresses and operands referenced by instructions
in the segment. If the flag is set, 32-bit addresses and 32-bit or 8-bit
operands are assumed; if it is clear, 16-bit addresses and 16-bit or 8-bit
operands are assumed. The instruction prefix 66H can be used to select an
operand size other than the default, and the prefix 67H can be used select
an address size other than the default.
Stack segment (data segment pointed to by the SS register). The flag is
called the B (big) flag and it specifies the size of the stack pointer used for
implicit stack operations (such as pushes, pops, and calls). If the flag is set,
a 32-bit stack pointer is used, which is stored in the 32-bit ESP register; if
the flag is clear, a 16-bit stack pointer is used, which is stored in the 16-bit
SP register. If the stack segment is set up to be an expand-down data
segment (described in the next paragraph), the B flag also specifies the
upper bound of the stack segment.
Expand-down data segment. The flag is called the B flag and it specifies
the upper bound of the segment. If the flag is set, the upper bound is
FFFFFFFFH (4 GBytes); if the flag is clear, the upper bound is FFFFH (64
KBytes).
Figure 3-9
G (granularity) flag
Determines the scaling of the segment limit field. When the granularity flag is
clear, the segment limit is interpreted in byte units; when flag is set, the
segment limit is interpreted in 4-KByte units. (This flag does not affect the
granularity of the base address; it is always byte granular.) When the granularity
flag is set, the twelve least significant bits of an offset are not tested when
checking the offset against the segment limit. For example, when the granularity
flag is set, a limit of 0 results in valid offsets from 0 to 4095.
Available and reserved bits
Bit 20 of the second doubleword of the segment descriptor is available for use
by system software; bit 21 is reserved and should always be set to 0.
3.4.3.1. CODE- AND DATA-SEGMENT DESCRIPTOR TYPES
When the S (descriptor type) flag in a segment descriptor is set, the descriptor is for either a code
or a data segment. The highest order bit of the type field (bit 11 of the second double word of
the segment descriptor) then determines whether the descriptor is for a data segment (clear) or
a code segment (set).
For data segments, the three low-order bits of the type field (bits 8, 9, and 10) are interpreted as
accessed (A), write-enable (W), and expansion-direction (E). Refer to Table 3-1 for a description
of the encoding of the bits in the type field for code and data segments. Data segments can
be read-only or read/write segments, depending on the setting of the write-enable bit.
Stack segments are data segments which must be read/write segments. Loading the SS register
with a segment selector for a nonwritable data segment generates a general-protection exception
(#GP). If the size of a stack segment needs to be changed dynamically, the stack segment can be
an expand-down data segment (expansion-direction flag set). Here, dynamically changing the
segment limit causes stack space to be added to the bottom of the stack. If the size of a stack
segment is intended to remain static, the stack segment may be either an expand-up or expanddown
type.
The accessed bit indicates whether the segment has been accessed since the last time the operating-
system or executive cleared the bit. The processor sets this bit whenever it loads a segment
selector for the segment into a segment register. The bit remains set until explicitly cleared. This
bit can be used both for virtual memory management and for debugging.
For code segments, the three low-order bits of the type field are interpreted as accessed (A), read
enable (R), and conforming (C). Code segments can be execute-only or execute/read, depending
on the setting of the read-enable bit. An execute/read segment might be used when constants or
other static data have been placed with instruction code in a ROM. Here, data can be read from
the code segment either by using an instruction with a CS override prefix or by loading a
segment selector for the code segment in a data-segment register (the DS, ES, FS, or GS registers).
In protected mode, code segments are not writable.
Code segments can be either conforming or nonconforming. A transfer of execution into a moreprivileged
conforming segment allows execution to continue at the current privilege level. A
transfer into a nonconforming segment at a different privilege level results in a general-protection
exception (#GP), unless a call gate or task gate is used (refer to Section 4.8.1., "Direct Calls
or Jumps to Code Segments" in Chapter 4, Protection for more information on conforming and
nonconforming code segments). System utilities that do not access protected facilities and
handlers for some types of exceptions (such as, divide error or overflow) may be loaded in
conforming code segments. Utilities that need to be protected from less privileged programs and
procedures should be placed in nonconforming code segments.
NOTE
Execution cannot be transferred by a call or a jump to a less-privileged
(numerically higher privilege level) code segment, regardless of whether the
target segment is a conforming or nonconforming code segment. Attempting
such an execution transfer will result in a general-protection exception.
All data segments are nonconforming, meaning that they cannot be accessed by less privileged
programs or procedures (code executing at numerically high privilege levels). Unlike code
segments, however, data segments can be accessed by more privileged programs or procedures
(code executing at numerically lower privilege levels) without using a special access gate.
The processor may update the Type field when a segment is accessed, even if the access is a read
cycle. If the descriptor tables have been put in ROM, it may be necessary for hardware to prevent
the ROM from being enabled onto the data bus during a write cycle. It also may be necessary to
return the READY# signal to the processor when a write cycle to ROM occurs, otherwise
the cycle will not terminate. These features of the hardware design are necessary for using
ROM-based descriptor tables with the Intel386T DX processor, which always sets the
Accessed bit when a segment descriptor is loaded. The P6 family, PentiumR, and Intel486T
processors, however, only set the accessed bit if it is not already set. Writes to descriptor tables
in ROM can be avoided by setting the accessed bits in every descriptor.
3.5. SYSTEM DESCRIPTOR TYPES
When the S (descriptor type) flag in a segment descriptor is clear, the descriptor type is a system
descriptor. The processor recognizes the following types of system descriptors:
Local descriptor-table (LDT) segment descriptor.
Task-state segment (TSS) descriptor.
Call-gate descriptor.
Interrupt-gate descriptor.
Trap-gate descriptor.
Task-gate descriptor.
These descriptor types fall into two categories: system-segment descriptors and gate descriptors.
System-segment descriptors point to system segments (LDT and TSS segments). Gate descriptors
are in themselves "gates," which hold pointers to procedure entry points in code segments
(call, interrupt, and trap gates) or which hold segment selectors for TSS's (task gates). Table 3-2
shows the encoding of the type field for system-segment descriptors and gate descriptors.
For more information on the system-segment descriptors, refer to Section 3.5.1., "Segment
Descriptor Tables", and Section 6.2.2., "TSS Descriptor" in Chapter 6, Task Management. For
more information on the gate descriptors, refer to Section 4.8.2., "Gate Descriptors" in Chapter
4, Protection; Section 5.9., "IDT Descriptors" in Chapter 5, Interrupt and Exception Handling;
and Section 6.2.4., "Task-Gate Descriptor" in Chapter 6, Task Management.
3.5.1. Segment Descriptor Tables
A segment descriptor table is an array of segment descriptors (refer to Figure 3-10). A descriptor
table is variable in length and can contain up to 8192 (213) 8-byte descriptors. There are two
kinds of descriptor tables:
The global descriptor table (GDT)
The local descriptor tables (LDT)
Figure 3-10
Each system must have one GDT defined, which may be used for all programs and tasks in the
system. Optionally, one or more LDTs can be defined. For example, an LDT can be defined for
each separate task being run, or some or all tasks can share the same LDT.
The GDT is not a segment itself; instead, it is a data structure in the linear address space. The
base linear address and limit of the GDT must be loaded into the GDTR register (refer to Section
2.4., "Memory-Management Registers" in Chapter 2, System Architecture Overview). The base
addresses of the GDT should be aligned on an eight-byte boundary to yield the best processor
performance. The limit value for the GDT is expressed in bytes. As with segments, the limit
value is added to the base address to get the address of the last valid byte. A limit value of 0
results in exactly one valid byte. Because segment descriptors are always 8 bytes long, the GDT
limit should always be one less than an integral multiple of eight (that is, 8N - 1).
The first descriptor in the GDT is not used by the processor. A segment selector to this "null
descriptor" does not generate an exception when loaded into a data-segment register (DS, ES,
FS, or GS), but it always generates a general-protection exception (#GP) when an attempt is
made to access memory using the descriptor. By initializing the segment registers with this
segment selector, accidental reference to unused segment registers can be guaranteed to generate
an exception.
The LDT is located in a system segment of the LDT type. The GDT must contain a segment
descriptor for the LDT segment. If the system supports multiple LDTs, each must have a separate
segment selector and segment descriptor in the GDT. The segment descriptor for an LDT
can be located anywhere in the GDT. Refer to Section 3.5., "System Descriptor Types" for information
on the LDT segment-descriptor type.
An LDT is accessed with its segment selector. To eliminate address translations when accessing
the LDT, the segment selector, base linear address, limit, and access rights of the LDT are stored
in the LDTR register (refer to Section 2.4., "Memory-Management Registers" in Chapter 2,
System Architecture Overview).
When the GDTR register is stored (using the SGDT instruction), a 48-bit "pseudo-descriptor"
is stored in memory (refer to Figure 3-11). To avoid alignment check faults in user mode (privilege
level 3), the pseudo-descriptor should be located at an odd word address (that is, address
MOD 4 is equal to 2). This causes the processor to store an aligned word, followed by an aligned
doubleword. User-mode programs normally do not store pseudo-descriptors, but the possibility
of generating an alignment check fault can be avoided by aligning pseudo-descriptors in this
way. The same alignment should be used when storing the IDTR register using the SIDT instruction.
When storing the LDTR or task register (using the SLTR or STR instruction, respectively),
the pseudo-descriptor should be located at a doubleword address (that is, address MOD 4 is
equal to 0).
3.6. PAGING (VIRTUAL MEMORY)
When operating in protected mode, the Intel Architecture permits the linear address space to be
mapped directly into a large physical memory (for example, 4 GBytes of RAM) or indirectly
(using paging) into a smaller physical memory and disk storage. This latter method of mapping
the linear address space is commonly referred to as virtual memory or demand-paged virtual
memory.
When paging is used, the processor divides the linear address space into fixed-size pages (generally
4 KBytes in length) that can be mapped into physical memory and/or disk storage. When a
program (or task) references a logical address in memory, the processor translates the address
into a linear address and then uses its paging mechanism to translate the linear address into a
corresponding physical address. If the page containing the linear address is not currently in
physical memory, the processor generates a page-fault exception (#PF). The exception handler
for the page-fault exception typically directs the operating system or executive to load the page
from disk storage into physical memory (perhaps writing a different page from physical memory
out to disk in the process). When the page has been loaded in physical memory, a return from
the exception handler causes the instruction that generated the exception to be restarted. The
information that the processor uses to map linear addresses into the physical address space and
to generate page-fault exceptions (when necessary) is contained in page directories and page
tables stored in memory.
Paging is different from segmentation through its use of fixed-size pages. Unlike segments,
which usually are the same size as the code or data structures they hold, pages have a fixed size.
If segmentation is the only form of address translation used, a data structure present in physical
memory will have all of its parts in memory. If paging is used, a data structure can be partly in
memory and partly in disk storage.
To minimize the number of bus cycles required for address translation, the most recently
accessed page-directory and page-table entries are cached in the processor in devices called
translation lookaside buffers (TLBs). The TLBs satisfy most requests for reading the current
page directory and page tables without requiring a bus cycle. Extra bus cycles occur only when
the TLBs do not contain a page-table entry, which typically happens when a page has not been
accessed for a long time. Refer to Section 3.7., "Translation Lookaside Buffers (TLBs)" for
more information on the TLBs.
3.6.1. Paging Options
Paging is controlled by three flags in the processor's control registers:
PG (paging) flag, bit 31 of CR0 (available in all Intel Architecture processors beginning
with the Intel386T processor).
PSE (page size extensions) flag, bit 4 of CR4 (introduced in the PentiumR and PentiumR
Pro processors).
PAE (physical address extension) flag, bit 5 of CR4 (introduced in the PentiumR Pro
processors).
The PG flag enables the page-translation mechanism. The operating system or executive usually
sets this flag during processor initialization. The PG flag must be set if the processor's pagetranslation
mechanism is to be used to implement a demand-paged virtual memory system or if
the operating system is designed to run more than one program (or task) in virtual-8086 mode.
The PSE flag enables large page sizes: 4-MByte pages or 2-MByte pages (when the PAE flag is
set). When the PSE flag is clear, the more common page length of 4 KBytes is used. Refer to
Chapter 3.6.2.2., Linear Address Translation (4-MByte Pages) and Section 3.8.2., "Linear
Address Translation With Extended Addressing Enabled (2-MByte or 4-MByte Pages)" for
more information about the use of the PSE flag.
The PAE flag enables 36-bit physical addresses. This physical address extension can only be
used when paging is enabled. It relies on page directories and page tables to reference physical
addresses above FFFFFFFFH. Refer to Section 3.8., "Physical Address Extension" for more
information about the physical address extension.
3.6.2. Page Tables and Directories
The information that the processor uses to translate linear addresses into physical addresses
(when paging is enabled) is contained in four data structures:
Page directory-An array of 32-bit page-directory entries (PDEs) contained in a 4-KByte
page. Up to 1024 page-directory entries can be held in a page directory.
Page table-An array of 32-bit page-table entries (PTEs) contained in a 4-KByte page. Up
to 1024 page-table entries can be held in a page table. (Page tables are not used for 2-
MByte or 4-MByte pages. These page sizes are mapped directly from one or more pagedirectory
entries.)
Page-A 4-KByte, 2-MByte, or 4-MByte flat address space.
Page-Directory-Pointer Table-An array of four 64-bit entries, each of which points to a
page directory. This data structure is only used when the physical address extension is
enabled (refer to Section 3.8., "Physical Address Extension").
These tables provide access to either 4-KByte or 4-MByte pages when normal 32-bit physical
addressing is being used and to either 4-KByte, 2-MByte, or 4-MByte pages when extended (36-
bit) physical addressing is being used. Table 3-3 shows the page size and physical address size
obtained from various settings of the paging control flags. Each page-directory entry contains a
PS (page size) flag that specifies whether the entry points to a page table whose entries in turn
point to 4-KByte pages (PS set to 0) or whether the page-directory entry points directly to a 4-
MByte or 2-MByte page (PSE or PAE set to 1 and PS set to 1).
3.6.2.1. LINEAR ADDRESS TRANSLATION (4-KBYTE PAGES)
Figure 3-12 shows the page directory and page-table hierarchy when mapping linear addresses
to 4-KByte pages. The entries in the page directory point to page tables, and the entries in a page
table point to pages in physical memory. This paging method can be used to address up to 220
pages, which spans a linear address space of 232 bytes (4 GBytes).
Figure 3-12
To select the various table entries, the linear address is divided into three sections:
Page-directory entry-Bits 22 through 31 provide an offset to an entry in the page
directory. The selected entry provides the base physical address of a page table.
Page-table entry-Bits 12 through 21 of the linear address provide an offset to an entry in
the selected page table. This entry provides the base physical address of a page in physical
memory.
Page offset-Bits 0 through 11 provides an offset to a physical address in the page.
Memory management software has the option of using one page directory for all programs and
tasks, one page directory for each task, or some combination of the two.
3.6.2.2. LINEAR ADDRESS TRANSLATION (4-MBYTE PAGES)
Figure 3-12 shows how a page directory can be used to map linear addresses to 4-MByte pages.
The entries in the page directory point to 4-MByte pages in physical memory. This paging
method can be used to map up to 1024 pages into a 4-GByte linear address space.
Figure 3-13
The 4-MByte page size is selected by setting the PSE flag in control register CR4 and setting
the page size (PS) flag in a page-directory entry (refer to Figure 3-14). With these flags set, the
linear address is divided into two sections:
Page directory entry-Bits 22 through 31 provide an offset to an entry in the page
directory. The selected entry provides the base physical address of a 4-MByte page.
Page offset-Bits 0 through 21 provides an offset to a physical address in the page.
NOTE
(For the PentiumR processor only.) When enabling or disabling large page
sizes, the TLBs must be invalidated (flushed) after the PSE flag in control
register CR4 has been set or cleared. Otherwise, incorrect page translation
might occur due to the processor using outdated page translation information
stored in the TLBs. Refer to Section 9.10., "Invalidating the Translation
Lookaside Buffers (TLBs)", in Chapter 9, Memory Cache Control, for
information on how to invalidate the TLBs.
3.6.2.3. MIXING 4-KBYTE AND 4-MBYTE PAGES
When the PSE flag in CR4 is set, both 4-MByte pages and page tables for 4-KByte pages can
be accessed from the same page directory. If the PSE flag is clear, only page tables for 4-KByte
pages can be accessed (regardless of the setting of the PS flag in a page-directory entry).
A typical example of mixing 4-KByte and 4-MByte pages is to place the operating system or
executive's kernel in a large page to reduce TLB misses and thus improve overall system performance.
The processor maintains 4-MByte page entries and 4-KByte page entries in separate
TLBs. So, placing often used code such as the kernel in a large page, frees up 4-KByte-page
TLB entries for application programs and tasks.
3.6.3. Base Address of the Page Directory
The physical address of the current page directory is stored in the CR3 register (also called the
page directory base register or PDBR). (Refer to Figure 2-5 and Section 2.5., "Control Registers"
in Chapter 2, System Architecture Overview for more information on the PDBR.) If paging
is to be used, the PDBR must be loaded as part of the processor initialization process (prior to
enabling paging). The PDBR can then be changed either explicitly by loading a new value in
CR3 with a MOV instruction or implicitly as part of a task switch. (Refer to Section 6.2.1.,
"Task-State Segment (TSS)" in Chapter 6, Task Management for a description of how the
contents of the CR3 register is set for a task.)
There is no present flag in the PDBR for the page directory. The page directory may be notpresent
(paged out of physical memory) while its associated task is suspended, but the operating
system must ensure that the page directory indicated by the PDBR image in a task's TSS is
present in physical memory before the task is dispatched. The page directory must also remain
in memory as long as the task is active.
3.6.4. Page-Directory and Page-Table Entries
Figure 3-14 shows the format for the page-directory and page-table entries when 4-KByte
pages and 32-bit physical addresses are being used. Figure 3-14 shows the format for the
page-directory entries when 4-MByte pages and 32-bit physical addresses are being used. Refer
to Section 3.8., "Physical Address Extension" for the format of page-directory and page-table
entries when the physical address extension is being used.
Figure 3-15
The functions of the flags and fields in the entries in Figures 3-14 and 3-15 are as follows:
Page base address, bits 12 through 32
(Page-table entries for 4-KByte pages.)
Specifies the physical address of the
first byte of a 4-KByte page. The bits in this field are interpreted as the 20 mostsignificant
bits of the physical address, which forces pages to be aligned on
4-KByte boundaries.
(Page-directory entries for 4-KByte page tables.) Specifies the physical
address of the first byte of a page table. The bits in this field are interpreted as
the 20 most-significant bits of the physical address, which forces page tables to
be aligned on 4-KByte boundaries.
(Page-directory entries for 4-MByte pages.) Specifies the physical address of
the first byte of a 4-MByte page. Only bits 22 through 31 of this field are used
(and bits 12 through 21 are reserved and must be set to 0, for Intel Architecture
processors through the PentiumR II processor). The base address bits are interpreted
as the 10 most-significant bits of the physical address, which forces 4-
MByte pages to be aligned on 4-MByte boundaries.
Present (P) flag, bit 0
Indicates whether the page or page table being pointed to by the entry is
currently loaded in physical memory. When the flag is set, the page is in physical
memory and address translation is carried out. When the flag is clear, the
page is not in memory and, if the processor attempts to access the page, it
generates a page-fault exception (#PF).
The processor does not set or clear this flag; it is up to the operating system or
executive to maintain the state of the flag.
The bit must be set to 1 whenever extended physical addressing mode is
enabled.
If the processor generates a page-fault exception, the operating system must
carry out the following operations in the order below:
1. Copy the page from disk storage into physical memory, if needed.
2. Load the page address into the page-table or page-directory entry and set
its present flag. Other bits, such as the dirty and accessed flags, may also
be set at this time.
3. Invalidate the current page-table entry in the TLB (refer to Section 3.7.,
"Translation Lookaside Buffers (TLBs)" for a discussion of TLBs and
how to invalidate them).
4. Return from the page-fault handler to restart the interrupted program or
task.
Read/write (R/W) flag, bit 1
Specifies the read-write privileges for a page or group of pages (in the case of
a page-directory entry that points to a page table). When this flag is clear, the
page is read only; when the flag is set, the page can be read and written into.
This flag interacts with the U/S flag and the WP flag in register CR0. Refer to
Section 4.11., "Page-Level Protection" and Table 4-2 in Chapter 4, Protection
for a detailed discussion of the use of these flags.
User/supervisor (U/S) flag, bit 2
Specifies the user-supervisor privileges for a page or group of pages (in the
case of a page-directory entry that points to a page table). When this flag is
clear, the page is assigned the supervisor privilege level; when the flag is set,
the page is assigned the user privilege level. This flag interacts with the R/W
flag and the WP flag in register CR0. Refer to Section 4.11., "Page-Level
Protection" and Table 4-2 in Chapter 4, Protection for a detail discussion of the
use of these flags.
Page-level write-through (PWT) flag, bit 3
Controls the write-through or write-back caching policy of individual pages or
page tables. When the PWT flag is set, write-through caching is enabled for the
associated page or page table; when the flag is clear, write-back caching is
enabled for the associated page or page table. The processor ignores this flag if
the CD (cache disable) flag in CR0 is set. Refer to Section 9.5., "Cache
Control", in Chapter 9, Memory Cache Control, for more information about the
use of this flag. Refer to Section 2.5., "Control Registers" in Chapter 2, System
Architecture Overview for a description of a companion PWT flag in control
register CR3.
Page-level cache disable (PCD) flag, bit 4
Controls the caching of individual pages or page tables. When the PCD flag is
set, caching of the associated page or page table is prevented; when the flag is
clear, the page or page table can be cached. This flag permits caching to be
disabled for pages that contain memory-mapped I/O ports or that do not
provide a performance benefit when cached. The processor ignores this flag
(assumes it is set) if the CD (cache disable) flag in CR0 is set. Refer to Chapter
9, Memory Cache Control, for more information about the use of this flag.
Refer to Section 2.5. in Chapter 2, System Architecture Overview for a description
of a companion PCD flag in control register CR3.
Accessed (A) flag, bit 5
Indicates whether a page or page table has been accessed (read from or written
to) when set. Memory management software typically clears this flag when a
page or page table is initially loaded into physical memory. The processor then
sets this flag the first time a page or page table is accessed. This flag is a
"sticky" flag, meaning that once set, the processor does not implicitly clear it.
Only software can clear this flag. The accessed and dirty flags are provided for
use by memory management software to manage the transfer of pages and page
tables into and out of physical memory.
Dirty (D) flag, bit 6
Indicates whether a page has been written to when set. (This flag is not used in
page-directory entries that point to page tables.) Memory management software
typically clears this flag when a page is initially loaded into physical
memory. The processor then sets this flag the first time a page is accessed for
a write operation. This flag is "sticky," meaning that once set, the processor
does not implicitly clear it. Only software can clear this flag. The dirty and
accessed flags are provided for use by memory management software to
manage the transfer of pages and page tables into and out of physical memory.
Page size (PS) flag, bit 7
Determines the page size. This flag is only used in page-directory entries.
When this flag is clear, the page size is 4 KBytes and the page-directory entry
points to a page table. When the flag is set, the page size is 4 MBytes for normal
32-bit addressing (and 2 MBytes if extended physical addressing is enabled)
and the page-directory entry points to a page. If the page-directory entry points
to a page table, all the pages associated with that page table will be 4-KByte
pages.
Global (G) flag, bit 8
(Introduced in the PentiumR Pro processor.) Indicates a global page when set.
When a page is marked global and the page global enable (PGE) flag in register
CR4 is set, the page-table or page-directory entry for the page is not invalidated
in the TLB when register CR3 is loaded or a task switch occurs. This flag is
provided to prevent frequently used pages (such as pages that contain kernel or
other operating system or executive code) from being flushed from the TLB.
Only software can set or clear this flag. For page-directory entries that point to
page tables, this flag is ignored and the global characteristics of a page are set
in the page-table entries. Refer to Section 3.7., "Translation Lookaside Buffers
(TLBs)" for more information about the use of this flag. (This bit is reserved in
PentiumR and earlier Intel Architecture processors.)
Reserved and available-to-software bits
In a page-table entry, bit 7 is reserved and should be set to 0; in a page-directory
entry that points to a page table, bit 6 is reserved and should be set to 0. For a
page-directory entry for a 4-MByte page, bits 12 through 21 are reserved and
must be set to 0, for Intel Architecture processors through the PentiumR II
processor. For both types of entries, bits 9, 10, and 11 are available for use by
software. (When the present bit is clear, bits 1 through 31 are available to software-
refer to Figure 3-16.) When the PSE and PAE flags in control register
CR4 are set, the processor generates a page fault if reserved bits are not set to 0.
3.6.5. Not Present Page-Directory and Page-Table Entries
When the present flag is clear for a page-table or page-directory entry, the operating system or
executive may use the rest of the entry for storage of information such as the location of the page
in the disk storage system (refer to ).
3.7. TRANSLATION LOOKASIDE BUFFERS (TLBS)
The processor stores the most recently used page-directory and page-table entries in on-chip
caches called translation lookaside buffers or TLBs. The P6 family and PentiumR processors
have separate TLBs for the data and instruction caches. Also, the P6 family processors maintain
separate TLBs for 4-KByte and 4-MByte page sizes. The CPUID instruction can be used to
determine the sizes of the TLBs provided in the P6 family and PentiumR processors.
Most paging is performed using the contents of the TLBs. Bus cycles to the page directory and
page tables in memory are performed only when the TLBs do not contain the translation information
for a requested page.
The TLBs are inaccessible to application programs and tasks (privilege level greater than 0); that
is, they cannot invalidate TLBs. Only operating system or executive procedures running at privilege
level of 0 can invalidate TLBs or selected TBL entries. Whenever a page-directory or
page-table entry is changed (including when the present flag is set to zero), the operating-system
must immediately invalidate the corresponding entry in the TLB so that it can be updated the
next time the entry is referenced. However, if the physical address extension (PAE) feature is
enabled to use 36-bit addressing, a new table is added to the paging hierarchy. This new table is
called the page directory pointer table (as described in Section 3.8., "Physical Address Extension").
If an entry is changed in this table (to point to another page directory), the TLBs must
then be flushed by writing to CR3.
All (nonglobal) TLBs are automatically invalidated any time the CR3 register is loaded (unless
the G flag for a page or page-table entry is set, as describe later in this section). The CR3 register
can be loaded in either of two ways:
Explicitly, using the MOV instruction, for example:
MOV CR3, EAX
where the EAX register contains an appropriate page-directory base address.
Implicitly by executing a task switch, which automatically changes the contents of the CR3
register.
The INVLPG instruction is provided to invalidate a specific page-table entry in the TLB.
Normally, this instruction invalidates only an individual TLB entry; however, in some cases, it
may invalidate more than the selected entry and may even invalidate all of the TLBs. This
instruction ignores the setting of the G flag in a page-directory or page-table entry (refer to the
following paragraph).
(Introduced in the PentiumR Pro processor.) The page global enable (PGE) flag in register CR4
and the global (G) flag of a page-directory or page-table entry (bit 8) can be used to prevent
frequently used pages from being automatically invalidated in the TLBs on a task switch or a
load of register CR3. (Refer to Section 3.6.4., "Page-Directory and Page-Table Entries" for more
information about the global flag.) When the processor loads a page-directory or page-table
entry for a global page into a TLB, the entry will remain in the TLB indefinitely. The only way
to deterministically invalidate global page entries is to clear the PGE flag and then invalidate the
TLBs or to use the INVLPG instruction to invalidate individual page-directory or page-table
entries in the TLBs.
For additional information about invalidation of the TLBs, refer to Section 9.10., "Invalidating
the Translation Lookaside Buffers (TLBs)", in Chapter 9, Memory Cache Control.
3.8. PHYSICAL ADDRESS EXTENSION
The physical address extension (PAE) flag in register CR4 enables an extension of physical
addresses from 32 bits to 36 bits. (This feature was introduced into the Intel Architecture in the
PentiumR Pro processors.) Here, the processor provides 4 additional address line pins to accommodate
the additional address bits. This option can only be used when paging is enabled (that
is, when both the PG flag in register CR0 and the PAE flag in register CR4 are set).
When the physical address extension is enabled, the processor allows several sizes of pages:
4-KByte, 2-MByte, or 4-MByte. As with 32-bit addressing, these page sizes can be addressed
within the same set of paging tables (that is, a page-directory entry can point to either a 2-MByte
or 4-MByte page or a page table that in turn points to 4-KByte pages). To support the 36-bit
physical addresses, the following changes are made to the paging data structures:
The paging table entries are increased to 64 bits to accommodate 36-bit base physical
addresses. Each 4-KByte page directory and page table can thus have up to 512 entries.
A new table, called the page-directory-pointer table, is added to the linear-address
translation hierarchy. This table has 4 entries of 64-bits each, and it lies above the page
directory in the hierarchy. With the physical address extension mechanism enabled, the
processor supports up to 4 page directories.
The 20-bit page-directory base address field in register CR3 (PDPR) is replaced with a
27-bit page-directory-pointer-table base address field (refer to Figure 3-17). (In this case,
register CR3 is called the PDPTR.) This field provides the 27 most-significant bits of the
physical address of the first byte of the page-directory-pointer table, which forces the table
to be located on a 32-byte boundary.
Linear address translation is changed to allow mapping 32-bit linear addresses into the
larger physical address space.
3.8.1. Linear Address Translation With Extended Addressing
Enabled (4-KByte Pages)
Figure 3-12 shows the page-directory-pointer, page-directory, and page-table hierarchy when
mapping linear addresses to 4-KByte pages with extended physical addressing enabled. This
paging method can be used to address up to 220 pages, which spans a linear address space of 232
bytes (4 GBytes).
Figure 3-18
To select the various table entries, the linear address is divided into three sections:
Page-directory-pointer-table entry-Bits 30 and 31 provide an offset to one of the 4 entries
in the page-directory-pointer table. The selected entry provides the base physical address
of a page directory.
Page-directory entry-Bits 21 through 29 provide an offset to an entry in the selected page
directory. The selected entry provides the base physical address of a page table.
Page-table entry-Bits 12 through 20 provide an offset to an entry in the selected page
table. This entry provides the base physical address of a page in physical memory.
Page offset-Bits 0 through 11 provide an offset to a physical address in the page.
3.8.2. Linear Address Translation With Extended Addressing
Enabled (2-MByte or 4-MByte Pages)
Figure 3-12 shows how a page-directory-pointer table and page directories can be used to map
linear addresses to 2-MByte or 4-MByte pages. This paging method can be used to map up to
2048 pages (4 page-directory-pointer-table entries times 512 page-directory entries) into a
4-GByte linear address space.
The 2-MByte or 4-MByte page size is selected by setting the PSE flag in control register CR4
and setting the page size (PS) flag in a page-directory entry (refer to Figure 3-14). With these
flags set, the linear address is divided into three sections:
Page-directory-pointer-table entry-Bits 30 and 31 provide an offset to an entry in the
page-directory-pointer table. The selected entry provides the base physical address of a
page directory.
Page-directory entry-Bits 21 through 29 provide an offset to an entry in the page
directory. The selected entry provides the base physical address of a 2-MByte or 4-MByte
page.
Page offset-Bits 0 through 20 provides an offset to a physical address in the page.
3.8.3. Accessing the Full Extended Physical Address Space
With the Extended Page-Table Structure
The page-table structure described in the previous two sections allows up to 4 GBytes of
the 64-GByte extended physical address space to be addressed at one time. Additional 4-GByte
sections of physical memory can be addressed in either of two way:
Change the pointer in register CR3 to point to another page-directory-pointer table, which
in turn points to another set of page directories and page tables.
Change entries in the page-directory-pointer table to point to other page directories, which
in turn point to other sets of page tables.
Figure 3-19
3.8.4. Page-Directory and Page-Table Entries With Extended
Addressing Enabled
Figure 3-20 shows the format for the page-directory-pointer-table, page-directory, and
page-table entries when 4-KByte pages and 36-bit extended physical addresses are being
used. Figure 3-21 shows the format for the page-directory-pointer-table and page-directory
entries when 2-MByte or 4-MByte pages and 36-bit extended physical addresses are being
used. The functions of the flags in these entries are the same as described in Section 3.6.4.,
"Page-Directory and Page-Table Entries". The major differences in these entries are as follows:
A page-directory-pointer-table entry is added.
The size of the entries are increased from 32 bits to 64 bits.
The maximum number of entries in a page directory or page table is 512.
The base physical address field in each entry is extended to 24 bits.
The base physical address in an entry specifies the following, depending on the type of entry:
Page-directory-pointer-table entry-the physical address of the first byte of a
4-KByte page directory.
Page-directory entry-the physical address of the first byte of a 4-KByte page table or a
2-MByte page.
Page-table entry-the physical address of the first byte of a 4-KByte page.
For all table entries (except for page-directory entries that point to 2-MByte or 4-MByte pages),
the bits in the page base address are interpreted as the 24 most-significant bits of a 36-bit physical
address, which forces page tables and pages to be aligned on 4-KByte boundaries. When a
page-directory entry points to a 2-MByte or 4-MByte page, the base address is interpreted as the
15 most-significant bits of a 36-bit physical address, which forces pages to be aligned on 2-
MByte or 4-MByte boundaries.
The present (P) flag (bit 0) in all page-directory-pointer-table entries must be set to 1 anytime
extended physical addressing mode is enabled; that is, whenever the PAE flag (bit 5 in register
CR4) and the PG flag (bit 31 in register CR0) are set. If the P flag is not set in all 4 page-directory-
pointer-table entries in the page-directory-pointer table when extended physical addressing
is enabled, a general-protection exception (#GP) is generated.
The page size (PS) flag (bit 7) in a page-directory entry determines if the entry points to a page
table or a 2-MByte or 4-MByte page. When this flag is clear, the entry points to a page table;
when the flag is set, the entry points to a 2-MByte or 4-MByte page. This flag allows 4-KByte,
2-MByte, or 4-MByte pages to be mixed within one set of paging tables.
Access (A) and dirty (D) flags (bits 5 and 6) are provided for table entries that point to pages.
Bits 9, 10, and 11 in all the table entries for the physical address extension are available for use
by software. (When the present flag is clear, bits 1 through 63 are available to software.) All bits
in Figure 3-14 that are marked reserved or 0 should be set to 0 by software and not accessed by
software. When the PSE and/or PAE flags in control register CR4 are set, the processor generates
a page fault (#PF) if reserved bits in page-directory and page-table entries are not set to 0,
and it generates a general-protection exception (#GP) if reserved bits in a page-directorypointer-
table entry are not set to 0.
3.9. 36-BIT PAGE SIZE EXTENSION (PSE)
The 36-bit PSE extends 36-bit physical address support to 4-MByte pages while maintaining a
4-byte page-directory entry. This approach provides a simple mechanism for operating system
vendors to address physical memory above 4-GBytes without requiring major design changes,
but has practical limitations with respect to demand paging.
The P6 family of processors' physical address extension (PAE) feature provides generic access
to a 36-bit physical address space. However, it requires expansion of the page-directory and
page-table entries to an 8-byte format (64 bit), and the addition of a page-directory-pointer table,
resulting in another level of indirection to address translation.
For P6-family processors that support the 36-bit PSE feature, the virtual memory architecture is
extended to support 4-MByte page size granularity in combination with 36-bit physical
addressing. Note that some P6-family processors do not support this feature. For information
about determining a processor's feature support, refer to the following documents:
AP-485, Intel Processor Identification and the CPUID Instruction
Addendum-Intel Architecture Software Developer's Manual, Volume1: Basic Architecture
For information about the virtual memory architecture features of P6-family processors, refer to
Chapter 3 of the Intel Architecture Software Developer's Manual, Volume3: System Programming
Guide.
3.9.1. Description of the 36-bit PSE Feature
The 36-bit PSE feature (PSE-36) is detected by an operating system through the CPUID instruction.
Specifically, the operating system executes the CPUID instruction with the value 1 in the
EAX register and then determines support for the feature by inspecting bit 17 of the EDX
register return value (see Addendum-Intel Architecture Software Developer's Manual,
Volume1: Basic Architecture). If the PSE-36 feature is supported, an operating system is
permitted to utilize the feature, as well as use certain formerly reserved bits. To use the 36-bit
PSE feature, the PSE flag must be enabled by the operating system (bit 4 of CR4). Note that a
separate control bit in CR 4 does not exist to regulate the use of 36-bit MByte pages, because
this feature becomes the example for 4-MByte pages on processors that support it.
Table 3-8 shows the page size and physical address size obtained from various settings of the
page-control flags for the P6-family processors that support the 36-bit PSE feature. Shaded in
gray is the change to this table resulting from the 36-bit PSE feature.
To use the 36-bit PSE feature, the PAE feature must be cleared (as indicated in Table 3-4).
However, the 36-bit PSE in no way affects the PAE feature. Existing operating systems and softwware
that use the PAE will continue to have compatible functionality and features with P6-
family processors that support 36-bit PSE. Specifically, the Page-Directory Entry (PDE) format
when PAE is enabled for 2-MByte or 4-MByte pages is exactly as depicted in Figure 3-21 of the
Intel Architecture Software Developer's Manual, Volume3: System Programming Guide.
No matter which 36-bit addressing feature is used (PAE or 36-bit PSE), the linear address space
of the processor remains at 32 bits. Applications must partition the address space of their work
loads across multiple operating system process to take advantage of the additonal physical
memory provided in the system.
The 36-bit PSE feature estends the PDE format of the Intel Architecture for 4-MByte pages and
32-bit addresses by utilizing bits 16-13 (formerly reserved bits that were required to be zero) to
extend the physical address without requiring an 8-byte page-directory entry. Therefore, with
the 36-bit PSE feature, a page directory can contain up to 1024 entries, each pointing to a 4-
MByte page that can exist anywhere in the 36-bit physical address space of the processor.
Figure 3-22 shows the difference between PDE formats for 4-MByte pages on P6-family processors
that support the 36-bit PSE feature compared to P6-family processors that do not support
the 36-bit PSE feature (i.e., 32-bit addressing).
Figure 3-22 also shows the linear address mapping to 4-MByte pages when the 36-bit PSE is
enabled. The base physical address of the 4-MByte page is contained in the PDE. PA-2 (bits 13-
16) is used to provide the upper four bits (bits 32-35) of the 36-bit physical address. PA-1 (bits
22-31) continues to provide the next ten bits (bits 22-31) of the physical address for the 4-MByte
page. The offset into the page is provided by the lower 22 bits of the linear address. This scheme
eliminates the second level of indirection caused by the use of 4-KByte page tables.
Notes:
1. PA-2 = Bits 35-32 of thebase physical address for the 4-MByte page (correspond to bits 16-13)
2. PA-2 = Bits 31-22 of thebase physical address for the 4-MByte page
3. PAT = Bit 12 used as the Most Significant Bit of the index into Page Attribute Table (PAT); see Section
10.2.
4. PS = Bit 7 is the Page Size Bit-indicates 4-MByte page (must be set to 1)
5. Reserved = Bits 21-17 are reserved for future expansion
6. No change in format or meaning of bits 11-8 and 6-0; refer to Figure 3-15 for details.
The PSE-36 feature is transparent to existing operating systems that utilize 4-MByte pages,
because unused bits in PA-2 are currently enforced as zero by Intel processors. The feature
requires 4-MByte pages aligned on a 4-MByte boundary and 4 MBytes of physically contiguous
memory. Therefore, the ten bits of PA-1 are sufficient to specify the base physical address of any
4-MByte page below 4 GBytes. An operating system can easily support addresses greater than
4 GBytes simply by providing the upper 4 bits of the physical address in PA-2 when creating a
PDE for a 4-MByte page.
Figure 3-23 shows the linear address mapping to 4 MB pages when the 36-bit PSE is enabled.
The base physical address of the 4 MB page is contained in the PDE. PA-2 (bits 13-16) is used
to provide the upper four bits (bits 32-35) of the 36-bit physical address. PA-1 (bits 22-31)
continues to provide the next ten bits (bits 22-31) of the physical address for the 4 MB page. The
offset into the page is provided by the lower 22 bits of the linear address. This scheme eliminates
the second level of indirection caused by the use of 4 KB page tables.
Figure 3-23
The PSE-36 feature is transparent to existing operating systems that utilize 4 MB pages because
unused bits in PA-2 are currently enforced as zero by Intel processors. The feature requires 4
MB pages aligned on a 4 MB boundary and 4 MB of physically contiguous memory. Therefore,
the ten bits of PA-1 are sufficient to specify the base physical address of any 4 MB page below
4GB. An operating system easily can support addresses greater than 4 GB simply by providing
the upper 4 bits of the physical address in PA-2 when creating a PDE for a 4 MB page.
3.9.2. Fault Detection
There are several conditions that can cause P6-family processors that support this feature to
generate a page fault (PF) fault. These conditions are related to the use of, or switching between,
various memory management features:
If the PSE feature is enabled, a nonzero value in any of the remaining reserved bits (17-21)
of a 4-MByte PDE causes a page fault, with the reserved bit (bit 3) set in the error code.
If the PAE feature is enabled and set to use 2-MByte or 4-MByte pages (that is, 8-byte
page-directory table entries are being used), a nonzero value in any of the reserved bits 13-
20 causes a page fault, with the reserved bit (bit 3) set in the error code. Note that bit 12 is
now being used to support the Page Attribute Table feature (refer to Section 9.13., "Page
Attribute Table (PAT)").
3.10. MAPPING SEGMENTS TO PAGES
The segmentation and paging mechanisms provide in the Intel Architecture support a wide
variety of approaches to memory management. When segmentation and paging is combined,
segments can be mapped to pages in several ways. To implement a flat (unsegmented)
addressing environment, for example, all the code, data, and stack modules can be mapped to
one or more large segments (up to 4-GBytes) that share same range of linear addresses (refer to
Figure 3-2). Here, segments are essentially invisible to applications and the operating-system or
executive. If paging is used, the paging mechanism can map a single linear address space
(contained in a single segment) into virtual memory. Or, each program (or task) can have its own
large linear address space (contained in its own segment), which is mapped into virtual memory
through its own page directory and set of page tables.
Segments can be smaller than the size of a page. If one of these segments is placed in a page
which is not shared with another segment, the extra memory is wasted. For example, a small data
structure, such as a 1-byte semaphore, occupies 4K bytes if it is placed in a page by itself. If
many semaphores are used, it is more efficient to pack them into a single page.
The Intel Architecture does not enforce correspondence between the boundaries of pages and
segments. A page can contain the end of one segment and the beginning of another. Likewise, a
segment can contain the end of one page and the beginning of another.
Memory-management software may be simpler and more efficient if it enforces some alignment
between page and segment boundaries. For example, if a segment which can fit in one page is
placed in two pages, there may be twice as much paging overhead to support access to that
segment.
One approach to combining paging and segmentation that simplifies memory-management software
is to give each segment its own page table, as shown in Figure 3-24. This convention gives
the segment a single entry in the page directory that provides the access control information for
paging the entire segment.
Figure 3-24
CHAPTER 4 PROTECTION
In protected mode, the Intel Architecture provides a protection mechanism that operates at both
the segment level and the page level. This protection mechanism provides the ability to limit
access to certain segments or pages based on privilege levels (four privilege levels for segments
and two privilege levels for pages). For example, critical operating-system code and data can be
protected by placing them in more privileged segments than those that contain applications
code. The processor's protection mechanism will then prevent application code from accessing
the operating-system code and data in any but a controlled, defined manner.
Segment and page protection can be used at all stages of software development to assist in localizing
and detecting design problems and bugs. It can also be incorporated into end-products to
offer added robustness to operating systems, utilities software, and applications software.
When the protection mechanism is used, each memory reference is checked to verify that it
satisfies various protection checks. All checks are made before the memory cycle is started; any
violation results in an exception. Because checks are performed in parallel with address translation,
there is no performance penalty. The protection checks that are performed fall into the
following categories:
Limit checks.
Type checks.
Privilege level checks.
Restriction of addressable domain.
Restriction of procedure entry-points.
Restriction of instruction set.
All protection violation results in an exception being generated. Refer to Chapter 5, Interrupt
and Exception Handling for an explanation of the exception mechanism. This chapter describes
the protection mechanism and the violations which lead to exceptions.
The following sections describe the protection mechanism available in protected mode. Refer to
Chapter 16, 8086 Emulation for information on protection in real-address and virtual-8086
mode.
4.1. ENABLING AND DISABLING SEGMENT AND PAGE
PROTECTION
Setting the PE flag in register CR0 causes the processor to switch to protected mode, which in
turn enables the segment-protection mechanism. Once in protected mode, there is no control bit
for turning the protection mechanism on or off. The part of the segment-protection mechanism
that is based on privilege levels can essentially be disabled while still in protected mode by
assigning a privilege level of 0 (most privileged) to all segment selectors and segment descriptors.
This action disables the privilege level protection barriers between segments, but other
protection checks such as limit checking and type checking are still carried out.
Page-level protection is automatically enabled when paging is enabled (by setting the PG flag
in register CR0). Here again there is no mode bit for turning off page-level protection once
paging is enabled. However, page-level protection can be disabled by performing the following
operations:
Clear the WP flag in control register CR0.
Set the read/write (R/W) and user/supervisor (U/S) flags for each page-directory and pagetable
entry.
This action makes each page a writable, user page, which in effect disables page-level
protection.
4.2. FIELDS AND FLAGS USED FOR SEGMENT-LEVEL AND
PAGE-LEVEL PROTECTION
The processor's protection mechanism uses the following fields and flags in the system data
structures to control access to segments and pages:
Descriptor type (S) flag-(Bit 12 in the second doubleword of a segment descriptor.)
Determines if the segment descriptor is for a system segment or a code or data segment.
Type field-(Bits 8 through 11 in the second doubleword of a segment descriptor.)
Determines the type of code, data, or system segment.
Limit field-(Bits 0 through 15 of the first doubleword and bits 16 through 19 of the
second doubleword of a segment descriptor.) Determines the size of the segment, along
with the G flag and E flag (for data segments).
G flag-(Bit 23 in the second doubleword of a segment descriptor.) Determines the size of
the segment, along with the limit field and E flag (for data segments).
E flag-(Bit 10 in the second doubleword of a data-segment descriptor.) Determines the
size of the segment, along with the limit field and G flag.
Descriptor privilege level (DPL) field-(Bits 13 and 14 in the second doubleword of a
segment descriptor.) Determines the privilege level of the segment.
Requested privilege level (RPL) field. (Bits 0 and 1 of any segment selector.) Specifies the
requested privilege level of a segment selector.
Current privilege level (CPL) field. (Bits 0 and 1 of the CS segment register.) Indicates the
privilege level of the currently executing program or procedure. The term current privilege
level (CPL) refers to the setting of this field.
User/supervisor (U/S) flag. (Bit 2 of a page-directory or page-table entry.) Determines the
type of page: user or supervisor.
Read/write (R/W) flag. (Bit 1 of a page-directory or page-table entry.) Determines the type
of access allowed to a page: read only or read-write.
Figure 4-1 shows the location of the various fields and flags in the data, code, and systemsegment
descriptors; Figure 3-6 in Chapter 3, Protected-Mode Memory Management shows the
location of the RPL (or CPL) field in a segment selector (or the CS register); and Figure 3-14 in
Chapter 3, Protected-Mode Memory Management shows the location of the U/S and R/W flags
in the page-directory and page-table entries.
Figure 4-1
Many different styles of protection schemes can be implemented with these fields and flags.
When the operating system creates a descriptor, it places values in these fields and flags in
keeping with the particular protection style chosen for an operating system or executive. Application
program do not generally access or modify these fields and flags.
The following sections describe how the processor uses these fields and flags to perform the
various categories of checks described in the introduction to this chapter.
4.3. LIMIT CHECKING
The limit field of a segment descriptor prevents programs or procedures from addressing
memory locations outside the segment. The effective value of the limit depends on the setting
of the G (granularity) flag (refer to Figure 4-1). For data segments, the limit also depends on the
E (expansion direction) flag and the B (default stack pointer size and/or upper bound) flag. The
E flag is one of the bits in the type field when the segment descriptor is for a data-segment type.
When the G flag is clear (byte granularity), the effective limit is the value of the 20-bit limit field
in the segment descriptor. Here, the limit ranges from 0 to FFFFFH (1 MByte). When the G flag
is set (4-KByte page granularity), the processor scales the value in the limit field by a factor of
2^12 (4 KBytes). In this case, the effective limit ranges from FFFH (4 KBytes) to FFFFFFFFH
(4 GBytes). Note that when scaling is used (G flag is set), the lower 12 bits of a segment offset
(address) are not checked against the limit; for example, note that if the segment limit is 0,
offsets 0 through FFFH are still valid.
For all types of segments except expand-down data segments, the effective limit is the last
address that is allowed to be accessed in the segment, which is one less than the size, in bytes,
of the segment. The processor causes a general-protection exception any time an attempt is made
to access the following addresses in a segment:
A byte at an offset greater than the effective limit
A word at an offset greater than the (effective-limit - 1)
A doubleword at an offset greater than the (effective-limit - 3)
A quadword at an offset greater than the (effective-limit - 7)
For expand-down data segments, the segment limit has the same function but is interpreted
differently. Here, the effective limit specifies the last address that is not allowed to be accessed
within the segment; the range of valid offsets is from (effective-limit + 1) to FFFFFFFFH if the
B flag is set and from (effective-limit + 1) to FFFFH if the B flag is clear. An expand-down
segment has maximum size when the segment limit is 0.
Limit checking catches programming errors such as runaway code, runaway subscripts, and
invalid pointer calculations. These errors are detected when they occur, so identification of the
cause is easier. Without limit checking, these errors could overwrite code or data in another
segment.
In addition to checking segment limits, the processor also checks descriptor table limits. The
GDTR and IDTR registers contain 16-bit limit values that the processor uses to prevent
programs from selecting a segment descriptors outside the respective descriptor tables. The
LDTR and task registers contain 32-bit segment limit value (read from the segment descriptors
for the current LDT and TSS, respectively). The processor uses these segment limits to prevent
accesses beyond the bounds of the current LDT and TSS. Refer to Section 3.5.1., "Segment
Descriptor Tables" in Chapter 3, Protected-Mode Memory Management for more information
on the GDT and LDT limit fields; refer to Section 5.8., "Interrupt Descriptor Table (IDT)" in
Chapter 5, Interrupt and Exception Handling for more information on the IDT limit field; and
refer to Section 6.2.3., "Task Register" in Chapter 6, Task Management for more information on
the TSS segment limit field.
4.4. TYPE CHECKING
Segment descriptors contain type information in two places:
The S (descriptor type) flag.
The type field.
The processor uses this information to detect programming errors that result in an attempt to use
a segment or gate in an incorrect or unintended manner.
The S flag indicates whether a descriptor is a system type or a code or data type. The type field
provides 4 additional bits for use in defining various types of code, data, and system descriptors.
Table 3-1 in Chapter 3, Protected-Mode Memory Management shows the encoding of the type
field for code and data descriptors; Table 3-2 in Chapter 3, Protected-Mode Memory Management
shows the encoding of the field for system descriptors.
The processor examines type information at various times while operating on segment selectors
and segment descriptors. The following list gives examples of typical operations where type
checking is performed. This list is not exhaustive.
When a segment selector is loaded into a segment register. Certain segment registers
can contain only certain descriptor types, for example:
- The CS register only can be loaded with a selector for a code segment.
- Segment selectors for code segments that are not readable or for system segments
cannot be loaded into data-segment registers (DS, ES, FS, and GS).
- Only segment selectors of writable data segments can be loaded into the SS register.
When a segment selector is loaded into the LDTR or task register.
- The LDTR can only be loaded with a selector for an LDT.
- The task register can only be loaded with a segment selector for a TSS.
When instructions access segments whose descriptors are already loaded into
segment registers. Certain segments can be used by instructions only in certain predefined
ways, for example:
- No instruction may write into an executable segment.
- No instruction may write into a data segment if it is not writable.
- No instruction may read an executable segment unless the readable flag is set.
When an instruction operand contains a segment selector. Certain instructions can
access segment or gates of only a particular type, for example:
- A far CALL or far JMP instruction can only access a segment descriptor for a
conforming code segment, nonconforming code segment, call gate, task gate, or TSS.
- The LLDT instruction must reference a segment descriptor for an LDT.
- The LTR instruction must reference a segment descriptor for a TSS.
- The LAR instruction must reference a segment or gate descriptor for an LDT, TSS,
call gate, task gate, code segment, or data segment.
- The LSL instruction must reference a segment descriptor for a LDT, TSS, code
segment, or data segment.
- IDT entries must be interrupt, trap, or task gates.
During certain internal operations. For example:
- On a far call or far jump (executed with a far CALL or far JMP instruction), the
processor determines the type of control transfer to be carried out (call or jump to
another code segment, a call or jump through a gate, or a task switch) by checking the
type field in the segment (or gate) descriptor pointed to by the segment (or gate)
selector given as an operand in the CALL or JMP instruction. If the descriptor type is
for a code segment or call gate, a call or jump to another code segment is indicated; if
the descriptor type is for a TSS or task gate, a task switch is indicated.
- On a call or jump through a call gate (or on an interrupt- or exception-handler call
through a trap or interrupt gate), the processor automatically checks that the segment
descriptor being pointed to by the gate is for a code segment.
- On a call or jump to a new task through a task gate (or on an interrupt- or exceptionhandler
call to a new task through a task gate), the processor automatically checks that
the segment descriptor being pointed to by the task gate is for a TSS.
- On a call or jump to a new task by a direct reference to a TSS, the processor automatically
checks that the segment descriptor being pointed to by the CALL or JMP
instruction is for a TSS.
- On return from a nested task (initiated by an IRET instruction), the processor checks
that the previous task link field in the current TSS points to a TSS.
4.4.1. Null Segment Selector Checking
Attempting to load a null segment selector (refer to Section 3.4.1. in Chapter 3, Protected-Mode
Memory Management) into the CS or SS segment register generates a general-protection exception
(#GP). A null segment selector can be loaded into the DS, ES, FS, or GS register, but any
attempt to access a segment through one of these registers when it is loaded with a null segment
selector results in a #GP exception being generated. Loading unused data-segment registers with
a null segment selector is a useful method of detecting accesses to unused segment registers
and/or preventing unwanted accesses to data segments.
4.5. PRIVILEGE LEVELS
The processor's segment-protection mechanism recognizes 4 privilege levels, numbered from 0
to 3. The greater numbers mean lesser privileges. Figure 4-2 shows how these levels of privilege
can be interpreted as rings of protection. The center (reserved for the most privileged code, data,
and stacks) is used for the segments containing the critical software, usually the kernel of an
operating system. Outer rings are used for less critical software. (Systems that use only 2 of the
4 possible privilege levels should use levels 0 and 3.)
Figure 4-2
The processor uses privilege levels to prevent a program or task operating at a lesser privilege
level from accessing a segment with a greater privilege, except under controlled situations.
When the processor detects a privilege level violation, it generates a general-protection exception
(#GP).
To carry out privilege-level checks between code segments and data segments, the processor
recognizes the following three types of privilege levels:
Current privilege level (CPL). The CPL is the privilege level of the currently executing
program or task. It is stored in bits 0 and 1 of the CS and SS segment registers. Normally,
the CPL is equal to the privilege level of the code segment from which instructions are
being fetched. The processor changes the CPL when program control is transferred to a
code segment with a different privilege level. The CPL is treated slightly differently when
accessing conforming code segments. Conforming code segments can be accessed from
any privilege level that is equal to or numerically greater (less privileged) than the DPL of
the conforming code segment. Also, the CPL is not changed when the processor accesses a
conforming code segment that has a different privilege level than the CPL.
Descriptor privilege level (DPL). The DPL is the privilege level of a segment or gate. It is
stored in the DPL field of the segment or gate descriptor for the segment or gate. When the
currently executing code segment attempts to access a segment or gate, the DPL of the
segment or gate is compared to the CPL and RPL of the segment or gate selector (as
described later in this section). The DPL is interpreted differently, depending on the type of
segment or gate being accessed:
- Data segment. The DPL indicates the numerically highest privilege level that a
program or task can have to be allowed to access the segment. For example, if the DPL
of a data segment is 1, only programs running at a CPL of 0 or 1 can access the
segment.
- Nonconforming code segment (without using a call gate). The DPL indicates the
privilege level that a program or task must be at to access the segment. For example, if
the DPL of a nonconforming code segment is 0, only programs running at a CPL of 0
can access the segment.
- Call gate. The DPL indicates the numerically highest privilege level that the currently
executing program or task can be at and still be able to access the call gate. (This is the
same access rule as for a data segment.)
- Conforming code segment and nonconforming code segment accessed through a
call gate. The DPL indicates the numerically lowest privilege level that a program or
task can have to be allowed to access the segment. For example, if the DPL of a
conforming code segment is 2, programs running at a CPL of 0 or 1 cannot access the
segment.
- TSS. The DPL indicates the numerically highest privilege level that the currently
executing program or task can be at and still be able to access the TSS. (This is the
same access rule as for a data segment.)
Requested privilege level (RPL). The RPL is an override privilege level that is assigned
to segment selectors. It is stored in bits 0 and 1 of the segment selector. The processor
checks the RPL along with the CPL to determine if access to a segment is allowed. Even if
the program or task requesting access to a segment has sufficient privilege to access the
segment, access is denied if the RPL is not of sufficient privilege level. That is, if the RPL
of a segment selector is numerically greater than the CPL, the RPL overrides the CPL, and
vice versa. The RPL can be used to insure that privileged code does not access a segment
on behalf of an application program unless the program itself has access privileges for that
segment. Refer to Section 4.10.4., "Checking Caller Access Privileges (ARPL
Instruction)" for a detailed description of the purpose and typical use of the RPL.
Privilege levels are checked when the segment selector of a segment descriptor is loaded into a
segment register. The checks used for data access differ from those used for transfers of program
control among code segments; therefore, the two kinds of accesses are considered separately in
the following sections.
4.6. PRIVILEGE LEVEL CHECKING WHEN ACCESSING
DATA SEGMENTS
To access operands in a data segment, the segment selector for the data segment must be loaded
into the data-segment registers (DS, ES, FS, or GS) or into the stack-segment register (SS).
(Segment registers can be loaded with the MOV, POP, LDS, LES, LFS, LGS, and LSS instructions.)
Before the processor loads a segment selector into a segment register, it performs a privilege
check (refer to Figure 4-3) by comparing the privilege levels of the currently running
program or task (the CPL), the RPL of the segment selector, and the DPL of the segment's
segment descriptor. The processor loads the segment selector into the segment register if the
DPL is numerically greater than or equal to both the CPL and the RPL. Otherwise, a generalprotection
fault is generated and the segment register is not loaded.
Figure 4-3
Figure 4-4 shows four procedures (located in codes segments A, B, C, and D), each running at
different privilege levels and each attempting to access the same data segment.
The procedure in code segment A is able to access data segment E using segment selector
E1, because the CPL of code segment A and the RPL of segment selector E1 are equal to
the DPL of data segment E.
The procedure in code segment B is able to access data segment E using segment selector
E2, because the CPL of code segment A and the RPL of segment selector E2 are both
numerically lower than (more privileged) than the DPL of data segment E. A code segment
B procedure can also access data segment E using segment selector E1.
The procedure in code segment C is not able to access data segment E using segment
selector E3 (dotted line), because the CPL of code segment C and the RPL of segment
selector E3 are both numerically greater than (less privileged) than the DPL of data
segment E. Even if a code segment C procedure were to use segment selector E1 or E2,
such that the RPL would be acceptable, it still could not access data segment E because its
CPL is not privileged enough.
The procedure in code segment D should be able to access data segment E because code
segment D's CPL is numerically less than the DPL of data segment E. However, the RPL
of segment selector E3 (which the code segment D procedure is using to access data
segment E) is numerically greater than the DPL of data segment E, so access is not
allowed. If the code segment D procedure were to use segment selector E1 or E2 to access
the data segment, access would be allowed.
Figure 4-4
As demonstrated in the previous examples, the addressable domain of a program or task varies
as its CPL changes. When the CPL is 0, data segments at all privilege levels are accessible; when
the CPL is 1, only data segments at privilege levels 1 through 3 are accessible; when the CPL is
3, only data segments at privilege level 3 are accessible.
The RPL of a segment selector can always override the addressable domain of a program or task.
When properly used, RPLs can prevent problems caused by accidental (or intensional) use of
segment selectors for privileged data segments by less privileged programs or procedures.
It is important to note that the RPL of a segment selector for a data segment is under software
control. For example, an application program running at a CPL of 3 can set the RPL for a datasegment
selector to 0. With the RPL set to 0, only the CPL checks, not the RPL checks, will
provide protection against deliberate, direct attempts to violate privilege-level security for the
data segment. To prevent these types of privilege-level-check violations, a program or procedure
can check access privileges whenever it receives a data-segment selector from another procedure
(refer to Section 4.10.4., "Checking Caller Access Privileges (ARPL Instruction)").
4.6.1. Accessing Data in Code Segments
In some instances it may be desirable to access data structures that are contained in a code
segment. The following methods of accessing data in code segments are possible:
Load a data-segment register with a segment selector for a nonconforming, readable, code
segment.
Load a data-segment register with a segment selector for a conforming, readable, code
segment.
Use a code-segment override prefix (CS) to read a readable, code segment whose selector
is already loaded in the CS register.
The same rules for accessing data segments apply to method 1. Method 2 is always valid because
the privilege level of a conforming code segment is effectively the same as the CPL, regardless
of its DPL. Method 3 is always valid because the DPL of the code segment selected by the CS
register is the same as the CPL.
4.7. PRIVILEGE LEVEL CHECKING WHEN LOADING THE SS
REGISTER
Privilege level checking also occurs when the SS register is loaded with the segment selector for
a stack segment. Here all privilege levels related to the stack segment must match the CPL; that
is, the CPL, the RPL of the stack-segment selector, and the DPL of the stack-segment descriptor
must be the same. If the RPL and DPL are not equal to the CPL, a general-protection exception
(#GP) is generated.
4.8. PRIVILEGE LEVEL CHECKING WHEN TRANSFERRING
PROGRAM CONTROL BETWEEN CODE SEGMENTS
To transfer program control from one code segment to another, the segment selector for the
destination code segment must be loaded into the code-segment register (CS). As part of this
loading process, the processor examines the segment descriptor for the destination code segment
and performs various limit, type, and privilege checks. If these checks are successful, the CS
register is loaded, program control is transferred to the new code segment, and program execution
begins at the instruction pointed to by the EIP register.
Program control transfers are carried out with the JMP, CALL, RET, INT n, and IRET instructions,
as well as by the exception and interrupt mechanisms. Exceptions, interrupts, and the
IRET instruction are special cases discussed in Chapter 5, Interrupt and Exception Handling.
This chapter discusses only the JMP, CALL, and RET instructions.
A JMP or CALL instruction can reference another code segment in any of four ways:
The target operand contains the segment selector for the target code segment.
The target operand points to a call-gate descriptor, which contains the segment selector for
the target code segment.
The target operand points to a TSS, which contains the segment selector for the target code
segment.
The target operand points to a task gate, which points to a TSS, which in turn contains the
segment selector for the target code segment.
The following sections describe first two types of references. Refer to Section 6.3., "Task
Switching" in Chapter 6, Task Management for information on transferring program control
through a task gate and/or TSS.
4.8.1. Direct Calls or Jumps to Code Segments
The near forms of the JMP, CALL, and RET instructions transfer program control within the
current code segment, so privilege-level checks are not performed. The far forms of the JMP,
CALL, and RET instructions transfer control to other code segments, so the processor does
perform privilege-level checks.
When transferring program control to another code segment without going through a call gate,
the processor examines four kinds of privilege level and type information (refer to Figure 4-5):
The CPL. (Here, the CPL is the privilege level of the calling code segment; that is, the code
segment that contains the procedure that is making the call or jump.)
Figure 4-5
The DPL of the segment descriptor for the destination code segment that contains the
called procedure.
The RPL of the segment selector of the destination code segment.
The conforming (C) flag in the segment descriptor for the destination code segment, which
determines whether the segment is a conforming (C flag is set) or nonconforming (C flag is
clear) code segment. (Refer to Section 3.4.3.1., "Code- and Data-Segment Descriptor
Types" in Chapter 3, Protected-Mode Memory Management for more information about
this flag.)
The rules that the processor uses to check the CPL, RPL, and DPL depends on the setting of the
C flag, as described in the following sections.
4.8.1.1. ACCESSING NONCONFORMING CODE SEGMENTS
When accessing nonconforming code segments, the CPL of the calling procedure must be equal
to the DPL of the destination code segment; otherwise, the processor generates a general-protection
exception (#GP).
For example, in Figure 4-6, code segment C is a nonconforming code segment. Therefore, a
procedure in code segment A can call a procedure in code segment C (using segment selector
C1), because they are at the same privilege level (the CPL of code segment A is equal to the DPL
of code segment C). However, a procedure in code segment B cannot call a procedure in code
segment C (using segment selector C2 or C1), because the two code segments are at different
privilege levels.
Figure 4-6
The RPL of the segment selector that points to a nonconforming code segment has a limited
effect on the privilege check. The RPL must be numerically less than or equal to the CPL of the
calling procedure for a successful control transfer to occur. So, in the example in Figure 4-6, the
RPLs of segment selectors C1 and C2 could legally be set to 0, 1, or 2, but not to 3.
When the segment selector of a nonconforming code segment is loaded into the CS register, the
privilege level field is not changed; that is, it remains at the CPL (which is the privilege level of
the calling procedure). This is true, even if the RPL of the segment selector is different from the
CPL.
4.8.1.2. ACCESSING CONFORMING CODE SEGMENTS
When accessing conforming code segments, the CPL of the calling procedure may be numerically
equal to or greater than (less privileged) the DPL of the destination code segment; the
processor generates a general-protection exception (#GP) only if the CPL is less than the DPL.
(The segment selector RPL for the destination code segment is not checked if the segment is a
conforming code segment.)
In the example in Figure 4-6, code segment D is a conforming code segment. Therefore, calling
procedures in both code segment A and B can access code segment D (using either segment
selector D1 or D2, respectively), because they both have CPLs that are greater than or equal to
the DPL of the conforming code segment. For conforming code segments, the DPL represents
the numerically lowest privilege level that a calling procedure may be at to successfully
make a call to the code segment.
(Note that segments selectors D1 and D2 are identical except for their respective RPLs. But
since RPLs are not checked when accessing conforming code segments, the two segment selectors
are essentially interchangeable.)
When program control is transferred to a conforming code segment, the CPL does not change,
even if the DPL of the destination code segment is less than the CPL. This situation is the only
one where the CPL may be different from the DPL of the current code segment. Also, since the
CPL does not change, no stack switch occurs.
Conforming segments are used for code modules such as math libraries and exception handlers,
which support applications but do not require access to protected system facilities. These
modules are part of the operating system or executive software, but they can be executed at
numerically higher privilege levels (less privileged levels). Keeping the CPL at the level of a
calling code segment when switching to a conforming code segment prevents an application
program from accessing nonconforming code segments while at the privilege level (DPL) of a
conforming code segment and thus prevents it from accessing more privileged data.
Most code segments are nonconforming. For these segments, program control can be transferred
only to code segments at the same level of privilege, unless the transfer is carried out through a
call gate, as described in the following sections.
4.8.2. Gate Descriptors
To provide controlled access to code segments with different privilege levels, the processor
provides special set of descriptors called gate descriptors. There are four kinds of gate
descriptors:
Call gates
Trap gates
Interrupt gates
Task gates
Task gates are used for task switching and are discussed in Chapter 6, Task Management. Trap
and interrupt gates are special kinds of call gates used for calling exception and interrupt
handlers. The are described in Chapter 5, Interrupt and Exception Handling. This chapter is
concerned only with call gates.
4.8.3. Call Gates
Call gates facilitate controlled transfers of program control between different privilege levels.
They are typically used only in operating systems or executives that use the privilege-level
protection mechanism. Call gates are also useful for transferring program control between 16-bit
and 32-bit code segments, as described in Section 17.4., "Transferring Control Among Mixed-
Size Code Segments" in Chapter 17, Mixing 16-Bit and 32-Bit Code.
Figure 4-7 shows the format of a call-gate descriptor. A call-gate descriptor may reside in the
GDT or in an LDT, but not in the interrupt descriptor table (IDT). It performs six functions:
It specifies the code segment to be accessed.
It defines an entry point for a procedure in the specified code segment.
It specifies the privilege level required for a caller trying to access the procedure.
If a stack switch occurs, it specifies the number of optional parameters to be copied
between stacks.
It defines the size of values to be pushed onto the target stack: 16-bit gates force 16-bit
pushes and 32-bit gates force 32-bit pushes.
It specifies whether the call-gate descriptor is valid.
Figure 4-7
The segment selector field in a call gate specifies the code segment to be accessed. The offset
field specifies the entry point in the code segment. This entry point is generally to the first
instruction of a specific procedure. The DPL field indicates the privilege level of the call gate,
which in turn is the privilege level required to access the selected procedure through the gate.
The P flag indicates whether the call-gate descriptor is valid. (The presence of the code segment
to which the gate points is indicated by the P flag in the code segment's descriptor.) The parameter
count field indicates the number of parameters to copy from the calling procedures stack to
the new stack if a stack switch occurs (refer to Section 4.8.5., "Stack Switching"). The parameter
count specifies the number of words for 16-bit call gates and doublewords for 32-bit call gates.
Note that the P flag in a gate descriptor is normally always set to 1. If it is set to 0, a not present
(#NP) exception is generated when a program attempts to access the descriptor. The operating
system can use the P flag for special purposes. For example, it could be used to track the number
of times the gate is used. Here, the P flag is initially set to 0 causing a trap to the not-present
exception handler. The exception handler then increments a counter and sets the P flag to 1, so
that on returning from the handler, the gate descriptor will be valid.
4.8.4. Accessing a Code Segment Through a Call Gate
To access a call gate, a far pointer to the gate is provided as a target operand in a CALL or JMP
instruction. The segment selector from this pointer identifies the call gate (refer to Figure 4-8);
the offset from the pointer is required, but not used or checked by the processor. (The offset can
be set to any value.)
When the processor has accessed the call gate, it uses the segment selector from the call gate to
locate the segment descriptor for the destination code segment. (This segment descriptor can be
in the GDT or the LDT.) It then combines the base address from the code-segment descriptor
with the offset from the call gate to form the linear address of the procedure entry point in the
code segment.
As shown in Figure 4-9, four different privilege levels are used to check the validity of a
program control transfer through a call gate:
Figure 4-8
The privilege checking rules are different depending on whether the control transfer was initiated
with a CALL or a JMP instruction, as shown in Table 4-1.
The DPL field of the call-gate descriptor specifies the numerically highest privilege level from
which a calling procedure can access the call gate; that is, to access a call gate, the CPL of a
calling procedure must be equal to or less than the DPL of the call gate. For example, in Figure
4-12, call gate A has a DPL of 3. So calling procedures at all CPLs (0 through 3) can access this
call gate, which includes calling procedures in code segments A, B, and C. Call gate B has a
DPL of 2, so only calling procedures at a CPL or 0, 1, or 2 can access call gate B, which includes
calling procedures in code segments B and C. The dotted line shows that a calling procedure in
code segment A cannot access call gate B.
The RPL of the segment selector to a call gate must satisfy the same test as the CPL of the calling
procedure; that is, the RPL must be less than or equal to the DPL of the call gate. In the example
in Figure 4-12, a calling procedure in code segment C can access call gate B using gate selector
B2 or B1, but it could not use gate selector B3 to access call gate B.
If the privilege checks between the calling procedure and call gate are successful, the processor
then checks the DPL of the code-segment descriptor against the CPL of the calling procedure.
Here, the privilege check rules vary between CALL and JMP instructions. Only CALL instructions
can use call gates to transfer program control to more privileged (numerically lower privilege
level) nonconforming code segments; that is, to nonconforming code segments with a DPL
less than the CPL. A JMP instruction can use a call gate only to transfer program control to a
nonconforming code segment with a DPL equal to the CPL. CALL and JMP instruction can both
transfer program control to a more privileged conforming code segment; that is, to a conforming
code segment with a DPL less than or equal to the CPL.
If a call is made to a more privileged (numerically lower privilege level) nonconforming destination
code segment, the CPL is lowered to the DPL of the destination code segment and a stack
switch occurs (refer to Section 4.8.5., "Stack Switching"). If a call or jump is made to a more
privileged conforming destination code segment, the CPL is not changed and no stack switch
occurs.
Figure 4-10
Call gates allow a single code segment to have procedures that can be accessed at different privilege
levels. For example, an operating system located in a code segment may have some
services which are intended to be used by both the operating system and application software
(such as procedures for handling character I/O). Call gates for these procedures can be set up
that allow access at all privilege levels (0 through 3). More privileged call gates (with DPLs of
0 or 1) can then be set up for other operating system services that are intended to be used only
by the operating system (such as procedures that initialize device drivers).
4.8.5. Stack Switching
Whenever a call gate is used to transfer program control to a more privileged nonconforming
code segment (that is, when the DPL of the nonconforming destination code segment is less than
the CPL), the processor automatically switches to the stack for the destination code segment's
privilege level. This stack switching is carried out to prevent more privileged procedures from
crashing due to insufficient stack space. It also prevents less privileged procedures from interfering
(by accident or intent) with more privileged procedures through a shared stack.
Each task must define up to 4 stacks: one for applications code (running at privilege level 3) and
one for each of the privilege levels 2, 1, and 0 that are used. (If only two privilege levels are used
[3 and 0], then only two stacks must be defined.) Each of these stacks is located in a separate
segment and is identified with a segment selector and an offset into the stack segment (a stack
pointer).
The segment selector and stack pointer for the privilege level 3 stack is located in the SS and
ESP registers, respectively, when privilege-level-3 code is being executed and is automatically
stored on the called procedure's stack when a stack switch occurs.
Pointers to the privilege level 0, 1, and 2 stacks are stored in the TSS for the currently running
task (refer to Figure 6-2 in Chapter 6, Task Management). Each of these pointers consists of a
segment selector and a stack pointer (loaded into the ESP register). These initial pointers are
strictly read-only values. The processor does not change them while the task is running. They
are used only to create new stacks when calls are made to more privileged levels (numerically
lower privilege levels). These stacks are disposed of when a return is made from the called
procedure. The next time the procedure is called, a new stack is created using the initial stack
pointer. (The TSS does not specify a stack for privilege level 3 because the processor does not
allow a transfer of program control from a procedure running at a CPL of 0, 1, or 2 to a procedure
running at a CPL of 3, except on a return.)
The operating system is responsible for creating stacks and stack-segment descriptors for all the
privilege levels to be used and for loading initial pointers for these stacks into the TSS. Each
stack must be read/write accessible (as specified in the type field of its segment descriptor) and
must contain enough space (as specified in the limit field) to hold the following items:
The contents of the SS, ESP, CS, and EIP registers for the calling procedure.
The parameters and temporary variables required by the called procedure.
The EFLAGS register and error code, when implicit calls are made to an exception or
interrupt handler.
The stack will need to require enough space to contain many frames of these items, because
procedures often call other procedures, and an operating system may support nesting of multiple
interrupts. Each stack should be large enough to allow for the worst case nesting scenario at its
privilege level.
(If the operating system does not use the processor's multitasking mechanism, it still must create
at least one TSS for this stack-related purpose.)
When a procedure call through a call gate results in a change in privilege level, the processor
performs the following steps to switch stacks and begin execution of the called procedure at a
new privilege level:
1. Uses the DPL of the destination code segment (the new CPL) to select a pointer to the new
stack (segment selector and stack pointer) from the TSS.
2. Reads the segment selector and stack pointer for the stack to be switched to from the
current TSS. Any limit violations detected while reading the stack-segment selector, stack
pointer, or stack-segment descriptor cause an invalid TSS (#TS) exception to be generated.
3. Checks the stack-segment descriptor for the proper privileges and type and generates an
invalid TSS (#TS) exception if violations are detected.
4. Temporarily saves the current values of the SS and ESP registers.
5. Loads the segment selector and stack pointer for the new stack in the SS and ESP registers.
6. Pushes the temporarily saved values for the SS and ESP registers (for the calling
procedure) onto the new stack (refer to Figure 4-11).
7. Copies the number of parameter specified in the parameter count field of the call gate from
the calling procedure's stack to the new stack. If the count is 0, no parameters are copied.
8. Pushes the return instruction pointer (the current contents of the CS and EIP registers) onto
the new stack.
9. Loads the segment selector for the new code segment and the new instruction pointer from
the call gate into the CS and EIP registers, respectively, and begins execution of the called
procedure.
Refer to the description of the CALL instruction in Chapter 3, Instruction Set Reference, in the
Intel Architecture Software Developer's Manual, Volume 2, for a detailed description of the privilege
level checks and other protection checks that the processor performs on a far call through
a call gate.
Figure 4-11
Figure 4-11. Stack Switching During an Interprivilege-Level Call
The parameter count field in a call gate specifies the number of data items (up to 31) that the
processor should copy from the calling procedure's stack to the stack of the called procedure. If
more than 31 data items need to be passed to the called procedure, one of the parameters can be
a pointer to a data structure, or the saved contents of the SS and ESP registers may be used to
access parameters in the old stack space. The size of the data items passed to the called procedure
depends on the call gate size, as described in Section 4.8.3., "Call Gates"
4.8.6. Returning from a Called Procedure
The RET instruction can be used to perform a near return, a far return at the same privilege level,
and a far return to a different privilege level. This instruction is intended to execute returns from
procedures that were called with a CALL instruction. It does not support returns from a JMP
instruction, because the JMP instruction does not save a return instruction pointer on the stack.
A near return only transfers program control within the current code segment; therefore, the
processor performs only a limit check. When the processor pops the return instruction pointer
from the stack into the EIP register, it checks that the pointer does not exceed the limit of the
current code segment.
On a far return at the same privilege level, the processor pops both a segment selector for the
code segment being returned to and a return instruction pointer from the stack. Under normal
conditions, these pointers should be valid, because they were pushed on the stack by the CALL
instruction. However, the processor performs privilege checks to detect situations where the
current procedure might have altered the pointer or failed to maintain the stack properly.
A far return that requires a privilege-level change is only allowed when returning to a less privileged
level (that is, the DPL of the return code segment is numerically greater than the CPL).
The processor uses the RPL field from the CS register value saved for the calling procedure
(refer to Figure 4-11) to determine if a return to a numerically higher privilege level is required.
If the RPL is numerically greater (less privileged) than the CPL, a return across privilege levels
occurs.
The processor performs the following steps when performing a far return to a calling procedure
(refer to Figures 4-2 and 4-4 in the Intel Architecture Software Developer's Manual, Volume 1,
for an illustration of the stack contents prior to and after a return):
1. Checks the RPL field of the saved CS register value to determine if a privilege level
change is required on the return.
2. Loads the CS and EIP registers with the values on the called procedure's stack. (Type and
privilege level checks are performed on the code-segment descriptor and RPL of the codesegment
selector.)
3. (If the RET instruction includes a parameter count operand and the return requires a
privilege level change.) Adds the parameter count (in bytes obtained from the RET
instruction) to the current ESP register value (after popping the CS and EIP values), to step
past the parameters on the called procedure's stack. The resulting value in the ESP register
points to the saved SS and ESP values for the calling procedure's stack. (Note that the byte
count in the RET instruction must be chosen to match the parameter count in the call gate
that the calling procedure referenced when it made the original call multiplied by the size
of the parameters.)
4. (If the return requires a privilege level change.) Loads the SS and ESP registers with the
saved SS and ESP values and switches back to the calling procedure's stack. The SS and
ESP values for the called procedure's stack are discarded. Any limit violations detected
while loading the stack-segment selector or stack pointer cause a general-protection
exception (#GP) to be generated. The new stack-segment descriptor is also checked for
type and privilege violations.
5. (If the RET instruction includes a parameter count operand.) Adds the parameter count (in
bytes obtained from the RET instruction) to the current ESP register value, to step past the
parameters on the calling procedure's stack. The resulting ESP value is not checked against
the limit of the stack segment. If the ESP value is beyond the limit, that fact is not
recognized until the next stack operation.
6. (If the return requires a privilege level change.) Checks the contents of the DS, ES, FS, and
GS segment registers. If any of these registers refer to segments whose DPL is less than the
new CPL (excluding conforming code segments), the segment register is loaded with a null
segment selector.
Refer to the description of the RET instruction in Chapter 3, Instruction Set Reference, of the
Intel Architecture Software Developer's Manual, Volume 2, for a detailed description of the privilege
level checks and other protection checks that the processor performs on a far return.
4.9. PRIVILEGED INSTRUCTIONS
Some of the system instructions (called "privileged instructions" are protected from use by
application programs. The privileged instructions control system functions (such as the loading
of system registers). They can be executed only when the CPL is 0 (most privileged). If one of
these instructions is executed when the CPL is not 0, a general-protection exception (#GP) is
generated. The following system instructions are privileged instructions:
LGDT-Load GDT register.
LLDT-Load LDT register.
LTR-Load task register.
LIDT-Load IDT register.
MOV (control registers)-Load and store control registers.
LMSW-Load machine status word.
CLTS-Clear task-switched flag in register CR0.
MOV (debug registers)-Load and store debug registers.
INVD-Invalidate cache, without writeback.
WBINVD-Invalidate cache, with writeback.
INVLPG-Invalidate TLB entry.
HLT-Halt processor.
RDMSR-Read Model-Specific Registers.
WRMSR-Write Model-Specific Registers.
RDPMC-Read Performance-Monitoring Counter.
RDTSC-Read Time-Stamp Counter.
Some of the privileged instructions are available only in the more recent families of Intel Architecture
processors (refer to Section 18.7., "New Instructions In the PentiumR and Later Intel
Architecture Processors", in Chapter 18, Intel Architecture Compatibility).
The PCE and TSD flags in register CR4 (bits 4 and 2, respectively) enable the RDPMC and
RDTSC instructions, respectively, to be executed at any CPL.
4.10. POINTER VALIDATION
When operating in protected mode, the processor validates all pointers to enforce protection
between segments and maintain isolation between privilege levels. Pointer validation consists
of the following checks:
1. Checking access rights to determine if the segment type is compatible with its use.
2. Checking read/write rights
3. Checking if the pointer offset exceeds the segment limit.
4. Checking if the supplier of the pointer is allowed to access the segment.
5. Checking the offset alignment.
The processor automatically performs first, second, and third checks during instruction execution.
Software must explicitly request the fourth check by issuing an ARPL instruction. The fifth
check (offset alignment) is performed automatically at privilege level 3 if alignment checking is
turned on. Offset alignment does not affect isolation of privilege levels.
4.10.1. Checking Access Rights (LAR Instruction)
When the processor accesses a segment using a far pointer, it performs an access rights check
on the segment descriptor pointed to by the far pointer. This check is performed to determine if
type and privilege level (DPL) of the segment descriptor are compatible with the operation to be
performed. For example, when making a far call in protected mode, the segment-descriptor type
must be for a conforming or nonconforming code segment, a call gate, a task gate, or a TSS.
Then, if the call is to a nonconforming code segment, the DPL of the code segment must be equal
to the CPL, and the RPL of the code segment's segment selector must be less than or equal to
the DPL. If type or privilege level are found to be incompatible, the appropriate exception is
generated.
To prevent type incompatibility exceptions from being generated, software can check the access
rights of a segment descriptor using the LAR (load access rights) instruction. The LAR instruction
specifies the segment selector for the segment descriptor whose access rights are to be
checked and a destination register. The instruction then performs the following operations:
1. Check that the segment selector is not null.
2. Checks that the segment selector points to a segment descriptor that is within the descriptor
table limit (GDT or LDT).
3. Checks that the segment descriptor is a code, data, LDT, call gate, task gate, or TSS
segment-descriptor type.
4. If the segment is not a conforming code segment, checks if the segment descriptor is
visible at the CPL (that is, if the CPL and the RPL of the segment selector are less than or
equal to the DPL).
5. If the privilege level and type checks pass, loads the second doubleword of the segment
descriptor into the destination register (masked by the value 00FXFF00H, where X
indicates that the corresponding 4 bits are undefined) and sets the ZF flag in the EFLAGS
register. If the segment selector is not visible at the current privilege level or is an invalid
type for the LAR instruction, the instruction does not modify the destination register and
clears the ZF flag.
Once loaded in the destination register, software can preform additional checks on the access
rights information.
4.10.2. Checking Read/Write Rights (VERR and VERW
Instructions)
When the processor accesses any code or data segment it checks the read/write privileges
assigned to the segment to verify that the intended read or write operation is allowed. Software
can check read/write rights using the VERR (verify for reading) and VERW (verify for writing)
instructions. Both these instructions specify the segment selector for the segment being checked.
The instructions then perform the following operations:
1. Check that the segment selector is not null.
2. Checks that the segment selector points to a segment descriptor that is within the descriptor
table limit (GDT or LDT).
3. Checks that the segment descriptor is a code or data-segment descriptor type.
4. If the segment is not a conforming code segment, checks if the segment descriptor is
visible at the CPL (that is, if the CPL and the RPL of the segment selector are less than or
equal to the DPL).
5. Checks that the segment is readable (for the VERR instruction) or writable (for the
VERW) instruction.
The VERR instruction sets the ZF flag in the EFLAGS register if the segment is visible at the
CPL and readable; the VERW sets the ZF flag if the segment is visible and writable. (Code
segments are never writable.) The ZF flag is cleared if any of these checks fail.
4.10.3. Checking That the Pointer Offset Is Within Limits (LSL
Instruction)
When the processor accesses any segment it performs a limit check to insure that the offset is
within the limit of the segment. Software can perform this limit check using the LSL (load
segment limit) instruction. Like the LAR instruction, the LSL instruction specifies the segment
selector for the segment descriptor whose limit is to be checked and a destination register. The
instruction then performs the following operations:
1. Check that the segment selector is not null.
2. Checks that the segment selector points to a segment descriptor that is within the descriptor
table limit (GDT or LDT).
3. Checks that the segment descriptor is a code, data, LDT, or TSS segment-descriptor type.
4. If the segment is not a conforming code segment, checks if the segment descriptor is
visible at the CPL (that is, if the CPL and the RPL of the segment selector less than or
equal to the DPL).
5. If the privilege level and type checks pass, loads the unscrambled limit (the limit scaled
according to the setting of the G flag in the segment descriptor) into the destination register
and sets the ZF flag in the EFLAGS register. If the segment selector is not visible at the
current privilege level or is an invalid type for the LSL instruction, the instruction does not
modify the destination register and clears the ZF flag.
Once loaded in the destination register, software can compare the segment limit with the offset
of a pointer.
4.10.4. Checking Caller Access Privileges (ARPL Instruction)
The requestor's privilege level (RPL) field of a segment selector is intended to carry the privilege
level of a calling procedure (the calling procedure's CPL) to a called procedure. The called
procedure then uses the RPL to determine if access to a segment is allowed. The RPL is said to
"weaken" the privilege level of the called procedure to that of the RPL.
Operating-system procedures typically use the RPL to prevent less privileged application
programs from accessing data located in more privileged segments. When an operating-system
procedure (the called procedure) receives a segment selector from an application program (the
calling procedure), it sets the segment selector's RPL to the privilege level of the calling procedure.
Then, when the operating system uses the segment selector to access its associated
segment, the processor performs privilege checks using the calling procedure's privilege level
(stored in the RPL) rather than the numerically lower privilege level (the CPL) of the operatingsystem
procedure. The RPL thus insures that the operating system does not access a segment on
behalf of an application program unless that program itself has access to the segment.
Figure 4-12 shows an example of how the processor uses the RPL field. In this example, an
application program (located in code segment A) possesses a segment selector (segment selector
D1) that points to a privileged data structure (that is, a data structure located in a data segment
D at privilege level 0). The application program cannot access data segment D, because it does
not have sufficient privilege, but the operating system (located in code segment C) can. So, in
an attempt to access data segment D, the application program executes a call to the operating
system and passes segment selector D1 to the operating system as a parameter on the stack.
Before passing the segment selector, the (well behaved) application program sets the RPL of the
segment selector to its current privilege level (which in this example is 3). If the operating
system attempts to access data segment D using segment selector D1, the processor compares
the CPL (which is now 0 following the call), the RPL of segment selector D1, and the DPL of
data segment D (which is 0). Since the RPL is greater than the DPL, access to data segment D
is denied. The processor's protection mechanism thus protects data segment D from access by
the operating system, because application program's privilege level (represented by the RPL of
segment selector B) is greater than the DPL of data segment D.
Figure 4-12
Figure 4-12. Use of RPL to Weaken Privilege Level of Called Procedure
Now assume that instead of setting the RPL of the segment selector to 3, the application program
sets the RPL to 0 (segment selector D2). The operating system can now access data segment D,
because its CPL and the RPL of segment selector D2 are both equal to the DPL of data segment
D. Because the application program is able to change the RPL of a segment selector to any value,
it can potentially use a procedure operating at a numerically lower privilege level to access a
protected data structure. This ability to lower the RPL of a segment selector breaches the
processor's protection mechanism.
Because a called procedure cannot rely on the calling procedure to set the RPL correctly, operating-
system procedures (executing at numerically lower privilege-levels) that receive segment
selectors from numerically higher privilege-level procedures need to test the RPL of the segment
selector to determine if it is at the appropriate level. The ARPL (adjust requested privilege level)
instruction is provided for this purpose. This instruction adjusts the RPL of one segment selector
to match that of another segment selector.
The example in Figure 4-12 demonstrates how the ARPL instruction is intended to be used.
When the operating-system receives segment selector D2 from the application program, it uses
the ARPL instruction to compare the RPL of the segment selector with the privilege level of the
application program (represented by the code-segment selector pushed onto the stack). If the
RPL is less than application program's privilege level, the ARPL instruction changes the RPL
of the segment selector to match the privilege level of the application program (segment
selector D1). Using this instruction thus prevents a procedure running at a numerically higher
privilege level from accessing numerically lower privilege-level (more privileged) segments by
lowering the RPL of a segment selector.
Note that the privilege level of the application program can be determined by reading the RPL
field of the segment selector for the application-program's code segment. This segment selector
is stored on the stack as part of the call to the operating system. The operating system can copy
the segment selector from the stack into a register for use as an operand for the ARPL
instruction.
4.10.5. Checking Alignment
When the CPL is 3, alignment of memory references can be checked by setting the AM flag in
the CR0 register and the AC flag in the EFLAGS register. Unaligned memory references
generate alignment exceptions (#AC). The processor does not generate alignment exceptions
when operating at privilege level 0, 1, or 2. Refer to Table 5-7 in Chapter 5, Interrupt and Exception
Handling for a description of the alignment requirements when alignment checking is
enabled.
4.11. PAGE-LEVEL PROTECTION
Page-level protection can be used alone or applied to segments. When page-level protection is
used with the flat memory model, it allows supervisor code and data (the operating system or
executive) to be protected from user code and data (application programs). It also allows pages
containing code to be write protected. When the segment- and page-level protection are
combined, page-level read/write protection allows more protection granularity within segments.
With page-level protection (as with segment-level protection) each memory reference is
checked to verify that protection checks are satisfied. All checks are made before the memory
cycle is started, and any violation prevents the cycle from starting and results in a page-fault
exception being generated. Because checks are performed in parallel with address translation,
there is no performance penalty.
The processor performs two page-level protection checks:
Restriction of addressable domain (supervisor and user modes).
Page type (read only or read/write).
Violations of either of these checks results in a page-fault exception being generated. Refer to
Chapter 5, Interrupt and Exception Handling for an explanation of the page-fault exception
mechanism. This chapter describes the protection violations which lead to page-fault exceptions.
4.11.1. Page-Protection Flags
Protection information for pages is contained in two flags in a page-directory or page-table entry
(refer to Figure 3-14 in Chapter 3, Protected-Mode Memory Management): the read/write flag
(bit 1) and the user/supervisor flag (bit 2). The protection checks are applied to both first- and
second-level page tables (that is, page directories and page tables).
4.11.2. Restricting Addressable Domain
The page-level protection mechanism allows restricting access to pages based on two privilege
levels:
Supervisor mode (U/S flag is 0)-(Most privileged) For the operating system or executive,
other system software (such as device drivers), and protected system data (such as page
tables).
User mode (U/S flag is 1)-(Least privileged) For application code and data.
The segment privilege levels map to the page privilege levels as follows. If the processor is
currently operating at a CPL of 0, 1, or 2, it is in supervisor mode; if it is operating at a CPL of
3, it is in user mode. When the processor is in supervisor mode, it can access all pages; when in
user mode, it can access only user-level pages. (Note that the WP flag in control register CR0
modifies the supervisor permissions, as described in Section 4.11.3., "Page Type")
Note that to use the page-level protection mechanism, code and data segments must be set up
for at least two segment-based privilege levels: level 0 for supervisor code and data segments
and level 3 for user code and data segments. (In this model, the stacks are placed in the data
segments.) To minimize the use of segments, a flat memory model can be used (refer to Section
3.2.1., "Basic Flat Model" in Section 3, "Protected-Mode Memory Management"). Here, the
user and supervisor code and data segments all begin at address zero in the linear address space
and overlay each other. With this arrangement, operating-system code (running at the supervisor
level) and application code (running at the user level) can execute as if there are no segments.
Protection between operating-system and application code and data is provided by the
processor's page-level protection mechanism.
4.11.3. Page Type
The page-level protection mechanism recognizes two page types:
Read-only access (R/W flag is 0).
Read/write access (R/W flag is 1).
When the processor is in supervisor mode and the WP flag in register CR0 is clear (its state
following reset initialization), all pages are both readable and writable (write-protection is
ignored). When the processor is in user mode, it can write only to user-mode pages that are
read/write accessible. User-mode pages which are read/write or read-only are readable; supervisor-
mode pages are neither readable nor writable from user mode. A page-fault exception is
generated on any attempt to violate the protection rules.
The P6 family, PentiumR, and Intel486T processors allow user-mode pages to be writeprotected
against supervisor-mode access. Setting the WP flag in register CR0 to 1 enables
supervisor-mode sensitivity to user-mode, write-protected pages. This supervisor write-protect
feature is useful for implementing a "copy-on-write" strategy used by some operating systems,
such as UNIX*, for task creation (also called forking or spawning). When a new task is created,
it is possible to copy the entire address space of the parent task. This gives the child task a
complete, duplicate set of the parent's segments and pages. An alternative copy-on-write
strategy saves memory space and time by mapping the child's segments and pages to the same
segments and pages used by the parent task. A private copy of a page gets created only when
one of the tasks writes to the page. By using the WP flag and marking the shared pages as readonly,
the supervisor can detect an attempt to write to a user-level page, and can copy the page at
that time.
4.11.4. Combining Protection of Both Levels of Page Tables
For any one page, the protection attributes of its page-directory entry (first-level page table) may
differ from those of its page-table entry (second-level page table). The processor checks the
protection for a page in both its page-directory and the page-table entries. Table 4-2 shows the
protection provided by the possible combinations of protection attributes when the WP flag is
clear.
4.11.5. Overrides to Page Protection
The following types of memory accesses are checked as if they are privilege-level 0 accesses,
regardless of the CPL at which the processor is currently operating:
Access to segment descriptors in the GDT, LDT, or IDT.
Access to an inner-privilege-level stack during an inter-privilege-level call or a call to in
exception or interrupt handler, when a change of privilege level occurs.
4.12. COMBINING PAGE AND SEGMENT PROTECTION
When paging is enabled, the processor evaluates segment protection first, then evaluates page
protection. If the processor detects a protection violation at either the segment level or the page
level, the memory access is not carried out and an exception is generated. If an exception is
generated by segmentation, no paging exception is generated.
Page-level protections cannot be used to override segment-level protection. For example, a code
segment is by definition not writable. If a code segment is paged, setting the R/W flag for the
pages to read-write does not make the pages writable. Attempts to write into the pages will be
blocked by segment-level protection checks.
Page-level protection can be used to enhance segment-level protection. For example, if a large
read-write data segment is paged, the page-protection mechanism can be used to write-protect
individual pages.
NOTE:
* If the WP flag of CR0 is set, the access type is determined by the R/W flags of the page-directory and
page-table entries.
|