Serg Iakovlev

or:

and:

LINUX

Language

Kernel

Package

Book

Test

Forum

iakovlev.org

=> NASM info

NEWS

Последние статьи :

Тренажёр

16.01

Эльбрус

05.12

Алгоритмы

12.04

Rust

07.11

25.12

EXT4

10.11

FS benchmark

15.09

Сетунь

23.07

Trees

25.06

Apache

03.02

TOP 20

Assembler...

3013

Advanced Bash Scripting G...

2675

Ethreal 4...

2225

Secure Programming for Li...

2015

1804

1781

1719

1663

1635

1632

1626

1612

1604

1583

1567

1404

1396

1385

1274

1253

01.01.2025 : 3803065 посещений

iakovlev.org

NASM info

3.1. Layout of a NASM Source Line Как в других ассемблерах,каждая строка NASM-исходника состоит из 4 основных частей :

 
      label:    instruction operands        ; comment

As usual, most of these fields are optional; the presence or absence of any combination of a label, an instruction and a comment is allowed. Of course, the operand field is either required or forbidden by the presence and nature of the instruction field.

NASM uses backslash (\) as the line continuation character; if a line ends with backslash, the next line is considered to be a part of the backslash- ended line.

NASM places no restrictions on white space within a line: labels may have white space before them, or instructions may have no space before them, or anything. The colon after a label is also optional. (Note that this means that if you intend to code `lodsb' alone on a line, and type `lodab' by accident, then that's still a valid source line which does nothing but define a label. Running NASM with the command-line option `-w+orphan-labels' will cause it to warn you if you define a label alone on a line without a trailing colon.)

Valid characters in labels are letters, numbers, `_', `$', `#', `@', `~', `.', and `?'. The only characters which may be used as the _first_ character of an identifier are letters, `.' (with special meaning: see *Note Section 3.9::), `_' and `?'. An identifier may also be prefixed with a `$' to indicate that it is intended to be read as an identifier and not a reserved word; thus, if some other module you are linking with defines a symbol called `eax', you can refer to `$eax' in NASM code to distinguish the symbol from the register.

The instruction field may contain any machine instruction: Pentium and P6 instructions, FPU instructions, MMX instructions and even undocumented instructions are all supported. The instruction may be prefixed by `LOCK', `REP', `REPE'/`REPZ' or `REPNE'/`REPNZ', in the usual way. Explicit address-size and operand-size prefixes `A16', `A32', `O16' and `O32' are provided - one example of their use is given in *Note Chapter 9::. You can also use the name of a segment register as an instruction prefix: coding `es mov [bx],ax' is equivalent to coding `mov [es:bx],ax'. We recommend the latter syntax, since it is consistent with other syntactic features of the language, but for instructions such as `LODSB', which has no operands and yet can require a segment override, there is no clean syntactic way to proceed apart from `es lodsb'.

An instruction is not required to use a prefix: prefixes such as `CS', `A32', `LOCK' or `REPE' can appear on a line by themselves, and NASM will just generate the prefix bytes.

In addition to actual machine instructions, NASM also supports a number of pseudo-instructions, described in *Note Section 3.2::. Instruction operands may take a number of forms: they can be registers, described simply by the register name (e.g. `ax', `bp', `ebx', `cr0': NASM does not use the `gas'-style syntax in which register names must be prefixed by a `%' sign), or they can be effective addresses (see *Note Section 3.3::), constants (*Note Section 3.4::) or expressions (*Note Section 3.5::).

For floating-point instructions, NASM accepts a wide range of syntaxes: you can use two-operand forms like MASM supports, or you can use NASM's native single-operand forms in most cases. Details of all forms of each supported instruction are given in *Note Appendix B::. For example, you can code:

                                                                                        
              fadd    st1             ; this sets st0 := st0 + st1
              fadd    st0,st1         ; so does this
                                                                                        
              fadd    st1,st0         ; this sets st1 := st1 + st0
              fadd    to st1          ; so does this

Almost any floating-point instruction that references memory must use one of the prefixes `DWORD', `QWORD' or `TWORD' to indicate what size of memory operand it refers to.

3.2. Pseudo-Instructions

Pseudo-instructions are things which, though not real x86 machine instructions, are used in the instruction field anyway because that's the most convenient place to put them. The current pseudo-instructions are `DB', `DW', `DD', `DQ' and `DT', their uninitialised counterparts `RESB', `RESW', `RESD', `RESQ' and `REST', the `INCBIN' command, the `EQU' command, and the `TIMES' prefix.

3.2.1. `DB' and friends: Declaring Initialised Data ---------------------------------------------------

`DB', `DW', `DD', `DQ' and `DT' are used, much as in MASM, to declare initialised data in the output file. They can be invoked in a wide range of ways:

                                                                                        
            db    0x55                ; just the byte 0x55
            db    0x55,0x56,0x57      ; three bytes in succession
            db    'a',0x55            ; character constants are OK
            db    'hello',13,10,'$'   ; so are string constants
            dw    0x1234              ; 0x34 0x12
            dw    'a'                 ; 0x61 0x00 (it's just a number)
            dw    'ab'                ; 0x61 0x62 (character constant)
            dw    'abc'               ; 0x61 0x62 0x63 0x00 (string)
            dd    0x12345678          ; 0x78 0x56 0x34 0x12
            dd    1.234567e20         ; floating-point constant
            dq    1.234567e20         ; double-precision float
            dt    1.234567e20         ; extended-precision float

`DQ' and `DT' do not accept numeric constants or string constants as operands.

3.2.2. `RESB' and friends: Declaring Uninitialised Data

`RESB', `RESW', `RESD', `RESQ' and `REST' are designed to be used in the BSS section of a module: they declare _uninitialised_ storage space. Each takes a single operand, which is the number of bytes, words, doublewords or whatever to reserve. As stated in *Note Section 2.2.7::, NASM does not support the MASM/TASM syntax of reserving uninitialised space by writing `DW ?' or similar things: this is what it does instead. The operand to a `RESB'-type pseudo- instruction is a _critical expression_: see *Note Section 3.8::.

For example:

                                                                                        
      buffer:         resb    64              ; reserve 64 bytes
      wordvar:        resw    1               ; reserve a word
      realarray       resq    10              ; array of ten reals

3.2.3. `INCBIN': Including External Binary Files

`INCBIN' is borrowed from the old Amiga assembler DevPac: it includes a binary file verbatim into the output file. This can be handy for (for example) including graphics and sound data directly into a game executable file. It can be called in one of these three ways:

                                                                                        
          incbin  "file.dat"             ; include the whole file
          incbin  "file.dat",1024        ; skip the first 1024 bytes
          incbin  "file.dat",1024,512    ; skip the first 1024, and
                                         ; actually include at most 512

3.2.4. `EQU': Defining Constants

`EQU' defines a symbol to a given constant value: when `EQU' is used, the source line must contain a label. The action of `EQU' is to define the given label name to the value of its (only) operand. This definition is absolute, and cannot change later. So, for example,

                                                                                        
      message         db      'hello, world'
      msglen          equ     $-message

defines `msglen' to be the constant 12. `msglen' may not then be redefined later. This is not a preprocessor definition either: the value of `msglen' is evaluated _once_, using the value of `$' (see *Note Section 3.5:: for an explanation of `$') at the point of definition, rather than being evaluated wherever it is referenced and using the value of `$' at the point of reference. Note that the operand to an `EQU' is also a critical expression (*Note Section 3.8::).

3.2.5. `TIMES': Repeating Instructions or Data

The `TIMES' prefix causes the instruction to be assembled multiple times. This is partly present as NASM's equivalent of the `DUP' syntax supported by MASM-compatible assemblers, in that you can code

zerobuf: times 64 db 0

or similar things; but `TIMES' is more versatile than that. The argument to `TIMES' is not just a numeric constant, but a numeric _expression_, so you can do things like

                                                                                       
      buffer: db      'hello, world'
              times 64-$+buffer db ' '

which will store exactly enough spaces to make the total length of `buffer' up to 64. Finally, `TIMES' can be applied to ordinary instructions, so you can code trivial unrolled loops in it:

times 100 movsb

Note that there is no effective difference between `times 100 resb 1' and `resb 100', except that the latter will be assembled about 100 times faster due to the internal structure of the assembler.

The operand to `TIMES', like that of `EQU' and those of `RESB' and friends, is a critical expression (*Note Section 3.8::).

3.3. Effective Addresses

An effective address is any operand to an instruction which references memory. Effective addresses, in NASM, have a very simple syntax: they consist of an expression evaluating to the desired address, enclosed in square brackets. For example:

                                                                                       
      wordvar dw      123
              mov     ax,[wordvar]
              mov     ax,[wordvar+1]
              mov     ax,[es:wordvar+bx]

Anything not conforming to this simple system is not a valid memory reference in NASM, for example `es:wordvar[bx]'.

More complicated effective addresses, such as those involving more than one register, work in exactly the same way:

                                                                                       
              mov     eax,[ebx*2+ecx+offset]
              mov     ax,[bp+di+8]

NASM is capable of doing algebra on these effective addresses, so that things which don't necessarily _look_ legal are perfectly all right:

                                                                                        
          mov     eax,[ebx*5]             ; assembles as [ebx*4+ebx]
          mov     eax,[label1*2-label2]   ; ie [label1+(label1-label2)]

Some forms of effective address have more than one assembled form; in most such cases NASM will generate the smallest form it can. For example, there are distinct assembled forms for the 32-bit effective addresses `[eax*2+0]' and `[eax+eax]', and NASM will generally generate the latter on the grounds that the former requires four bytes to store a zero offset.

NASM has a hinting mechanism which will cause `[eax+ebx]' and `[ebx+eax]' to generate different opcodes; this is occasionally useful because `[esi+ebp]' and `[ebp+esi]' have different default segment registers.

However, you can force NASM to generate an effective address in a particular form by the use of the keywords `BYTE', `WORD', `DWORD' and `NOSPLIT'. If you need `[eax+3]' to be assembled using a double-word offset field instead of the one byte NASM will normally generate, you can code `[dword eax+3]'. Similarly, you can force NASM to use a byte offset for a small value which it hasn't seen on the first pass (see *Note Section 3.8:: for an example of such a code fragment) by using `[byte eax+offset]'. As special cases, `[byte eax]' will code `[eax+0]' with a byte offset of zero, and `[dword eax]' will code it with a double-word offset of zero. The normal form, `[eax]', will be coded with no offset field.

The form described in the previous paragraph is also useful if you are trying to access data in a 32-bit segment from within 16 bit code. For more information on this see the section on mixed-size addressing (*Note Section 9.2::). In particular, if you need to access data with a known offset that is larger than will fit in a 16-bit value, if you don't specify that it is a dword offset, nasm will cause the high word of the offset to be lost. Similarly, NASM will split `[eax*2]' into `[eax+eax]' because that allows the offset field to be absent and space to be saved; in fact, it will also split `[eax*2+offset]' into `[eax+eax+offset]'. You can combat this behaviour by the use of the `NOSPLIT' keyword: `[nosplit eax*2]' will force `[eax*2+0]' to be generated literally.

3.4. Constants

NASM understands four different types of constant: numeric, character, string and floating-point.

3.4.1. Numeric Constants

A numeric constant is simply a number. NASM allows you to specify numbers in a variety of number bases, in a variety of ways: you can suffix `H', `Q' or `O', and `B' for hex, octal and binary, or you can prefix `0x' for hex in the style of C, or you can prefix `$' for hex in the style of Borland Pascal. Note, though, that the `$' prefix does double duty as a prefix on identifiers (see *Note Section 3.1::), so a hex number prefixed with a `$' sign must have a digit after the `$' rather than a letter.

Some examples:

                                                                                        
              mov     ax,100          ; decimal
              mov     ax,0a2h         ; hex
              mov     ax,$0a2         ; hex again: the 0 is required
              mov     ax,0xa2         ; hex yet again
              mov     ax,777q         ; octal
              mov     ax,777o         ; octal again
              mov     ax,10010011b    ; binary 
 3.4.2. Character Constants
 


                                                                                        
    A character constant consists of up to four characters enclosed in
 either single or double quotes. The type of quote makes no difference
 to NASM, except of course that surrounding the constant with single
 quotes allows double quotes to appear within it and vice versa.
 

                                                                                       
    A character constant with more than one character will be arranged
 with little-endian order in mind: if you code                                                                                        
                mov eax,'abcd'
                                                                                        
    then the constant generated is not `0x61626364', but `0x64636261',
 so that if you were then to store the value into memory, it would read
 `abcd' rather than `dcba'. This is also the sense of character
 constants understood by the Pentium's `CPUID' instruction (see *Note
 Section B.4.34::).
 


 3.4.3. String Constants
 


                                                                                        
    String constants are only acceptable to some pseudo-instructions,
 namely the `DB' family and `INCBIN'.
 

                                                                                       
    A string constant looks like a character constant, only longer. It is
 treated as a concatenation of maximum-size character constants for the
 conditions. So the following are equivalent:                                                                                        
            db    'hello'               ; string constant
            db    'h','e','l','l','o'   ; equivalent character constants                                                                                        
    And the following are also equivalent:                                                                                        
            dd    'ninechars'           ; doubleword string constant
            dd    'nine','char','s'     ; becomes three doublewords
            db    'ninechars',0,0,0     ; and really looks like this
                                                                                        
    Note that when used as an operand to `db', a constant like `'ab'' is
 treated as a string constant despite being short enough to be a
 character constant, because otherwise `db 'ab'' would have the same
 effect as `db 'a'', which would be silly. Similarly, three-character or
 four-character constants are treated as strings when they are operands
 to `dw'.
 


 3.4.4. Floating-Point Constants
 


                                                                                        
    Floating-point constants are acceptable only as arguments to `DD',
 `DQ' and `DT'. They are expressed in the traditional form: digits, then
 a period, then optionally more digits, then optionally an `E' followed
 by an exponent. The period is mandatory, so that NASM can distinguish
 between `dd 1', which declares an integer constant, and `dd 1.0' which
 declares a floating-point constant.
 

                                                                                       
    Some examples:                                                                                        
            dd    1.2                     ; an easy one
            dq    1.e10                   ; 10,000,000,000
            dq    1.e+10                  ; synonymous with 1.e10
            dq    1.e-10                  ; 0.000 000 000 1
            dt    3.141592653589793238462 ; pi
                                                                                        
    NASM cannot do compile-time arithmetic on floating-point constants.
 This is because NASM is designed to be portable - although it always
 generates code to run on x86 processors, the assembler itself can run
 on any system with an ANSI C compiler. Therefore, the assembler cannot
 guarantee the presence of a floating-point unit capable of handling the
 Intel number formats, and so for NASM to be able to do floating
 arithmetic it would have to include its own complete set of
 floating-point routines, which would significantly increase the size of
 the assembler for very little benefit.
 


 3.5. Expressions
 


                                                                                        
    Expressions in NASM are similar in syntax to those in C.
 

                                                                                       
    NASM does not guarantee the size of the integers used to evaluate
 expressions at compile time: since NASM can compile and run on 64-bit
 systems quite happily, don't assume that expressions are evaluated in
 32- bit registers and so try to make deliberate use of integer
 overflow. It might not always work. The only thing NASM will guarantee
 is what's guaranteed by ANSI C: you always have _at least_ 32 bits to
 work in.
 

                                                                                       
    NASM supports two special tokens in expressions, allowing
 calculations to involve the current assembly position: the `$' and `$$'
 tokens.  `$' evaluates to the assembly position at the beginning of the
 line containing the expression; so you can code an infinite loop using
 `JMP $'. `$$' evaluates to the beginning of the current section; so you
 can tell how far into the section you are by using `($-$$)'.
 

                                                                                       
    The arithmetic operators provided by NASM are listed here, in
 increasing order of precedence.
 


 3.5.1. `|': Bitwise OR Operator
 


                                                                                        
    The `|' operator gives a bitwise OR, exactly as performed by the
 `OR' machine instruction. Bitwise OR is the lowest-priority arithmetic
 operator supported by NASM.
 


 3.5.2. `^': Bitwise XOR Operator
 


                                                                                        
    `^' provides the bitwise XOR operation.
 


 3.5.3. `&': Bitwise AND Operator
 


                                                                                        
    `&' provides the bitwise AND operation.
 


 3.5.4. `<<' and `>>': Bit Shift Operators
 


                                                                                        
    `<<' gives a bit-shift to the left, just as it does in C. So `5<<3'
 evaluates to 5 times 8, or 40. `>>' gives a bit-shift to the right; in
 NASM, such a shift is _always_ unsigned, so that the bits shifted in
 from the left-hand end are filled with zero rather than a
 sign-extension of the previous highest bit.
 

                                                                                       
 3.5.6. `*', `/', `//', `%' and `%%': Multiplication and Division
 


                                                                                        
    `*' is the multiplication operator. `/' and `//' are both division
 operators: `/' is unsigned division and `//' is signed division.
 Similarly, `%' and `%%' provide unsigned and signed modulo operators
 respectively.
 

                                                                                       
    NASM, like ANSI C, provides no guarantees about the sensible
 operation of the signed modulo operator.
 

                                                                                       
    Since the `%' character is used extensively by the macro
 preprocessor, you should ensure that both the signed and unsigned
 modulo operators are followed by white space wherever they appear.
 


 3.6. `SEG' and `WRT'
 


                                                                                        
    When writing large 16-bit programs, which must be split into multiple
 segments, it is often necessary to be able to refer to the segment part
 of the address of a symbol. NASM supports the `SEG' operator to perform
 this function.
 

                                                                                       
    The `SEG' operator returns the _preferred_ segment base of a symbol,
 defined as the segment base relative to which the offset of the symbol
 makes sense. So the code                                                                                        
              mov     ax,seg symbol
              mov     es,ax
              mov     bx,symbol
                                                                                        
    will load `ES:BX' with a valid pointer to the symbol `symbol'.
 

                                                                                       
    Things can be more complex than this: since 16-bit segments and
 groups may overlap, you might occasionally want to refer to some symbol
 using a different segment base from the preferred one. NASM lets you do
 this, by the use of the `WRT' (With Reference To) keyword. So you can
 do things like                                                                                        
              mov     ax,weird_seg        ; weird_seg is a segment base
              mov     es,ax
              mov     bx,symbol wrt weird_seg
                                                                                        
    to load `ES:BX' with a different, but functionally equivalent,
 pointer to the symbol `symbol'.
   NASM supports far (inter-segment) calls and jumps by means of the
 syntax `call segment:offset', where `segment' and `offset' both
 represent immediate values. So to call a far procedure, you could code
 either of                                                                                        
              call    (seg procedure):procedure
              call    weird_seg:(procedure wrt weird_seg)
                                                                                        
    (The parentheses are included for clarity, to show the intended
 parsing of the above instructions. They are not necessary in practice.)
 

                                                                                       
    NASM supports the syntax `call far procedure' as a synonym for the
 first of the above usages. `JMP' works identically to `CALL' in these
 examples.
 

                                                                                       
    To declare a far pointer to a data item in a data segment, you must
 code
 

                                                                                       
              dw      symbol, seg symbol
 

                                                                                       
    NASM supports no convenient synonym for this, though you can always
 invent one using the macro processor.
 


 3.7. `STRICT': Inhibiting Optimization
 


                                                                                        
    When assembling with the optimizer set to level 2 or higher (see
 *Note Section 2.1.16::), NASM will use size specifiers (`BYTE', `WORD',
 `DWORD', `QWORD', or `TWORD'), but will give them the smallest possible
 size. The keyword `STRICT' can be used to inhibit optimization and
 force a particular operand to be emitted in the specified size. For
 example, with the optimizer on, and in `BITS 16' mode,
 

                                                                                       
              push dword 33
 

                                                                                       
    is encoded in three bytes `66 6A 21', whereas
 

                                                                                       
              push strict dword 33
 

                                                                                       
    is encoded in six bytes, with a full dword immediate operand `66 68
 21 00 00 00'.
 

                                                                                       
    With the optimizer off, the same code (six bytes) is generated
 whether the `STRICT' keyword was used or not.
 


 3.8. Critical Expressions
 


                                                                                        
    A limitation of NASM is that it is a two-pass assembler; unlike TASM
 and others, it will always do exactly two assembly passes. Therefore it
 is unable to cope with source files that are complex enough to require
 three or more passes.
 

                                                                                       
    The first pass is used to determine the size of all the assembled
 code and data, so that the second pass, when generating all the code,
 knows all the symbol addresses the code refers to. So one thing NASM
 can't handle is code whose size depends on the value of a symbol
 declared after the code in question. For example,
                                                                                        
              times (label-$) db 0
      label:  db      'Where am I?'
                                                                                        
    The argument to `TIMES' in this case could equally legally evaluate
 to anything at all; NASM will reject this example because it cannot
 tell the size of the `TIMES' line when it first sees it. It will just
 as firmly reject the slightly paradoxical code
                                                                                        
              times (label-$+1) db 0
      label:  db      'NOW where am I?'
                                                                                        
    in which _any_ value for the `TIMES' argument is by definition wrong!
    NASM rejects these examples by means of a concept called a _critical
 expression_, which is defined to be an expression whose value is
 required to be computable in the first pass, and which must therefore
 depend only on symbols defined before it. The argument to the `TIMES'
 prefix is a critical expression; for the same reason, the arguments to
 the `RESB' family of pseudo-instructions are also critical expressions.
 

                                                                                       
    Critical expressions can crop up in other contexts as well: consider
 the following code.
                                                                                        
                      mov     ax,symbol1
      symbol1         equ     symbol2
      symbol2:
                                                                                        
    On the first pass, NASM cannot determine the value of `symbol1',
 because `symbol1' is defined to be equal to `symbol2' which NASM hasn't
 seen yet. On the second pass, therefore, when it encounters the line
 `mov ax,symbol1', it is unable to generate the code for it because it
 still doesn't know the value of `symbol1'. On the next line, it would
 see the `EQU' again and be able to determine the value of `symbol1',
 but by then it would be too late.
 

                                                                                       
    NASM avoids this problem by defining the right-hand side of an `EQU'
 statement to be a critical expression, so the definition of `symbol1'
 would be rejected in the first pass.
 

                                                                                       
    There is a related issue involving forward references: consider this
 code fragment.              mov     eax,[ebx+offset]
      offset  equ     10
                                                                                        
    NASM, on pass one, must calculate the size of the instruction `mov
 eax,[ebx+offset]' without knowing the value of `offset'. It has no way
 of knowing that `offset' is small enough to fit into a one- byte offset
 field and that it could therefore get away with generating a shorter
 form of the effective-address encoding; for all it knows, in pass one,
 `offset' could be a symbol in the code segment, and it might need the
 full four-byte form. So it is forced to compute the size of the
 instruction to accommodate a four-byte address part. In pass two, having
 made this decision, it is now forced to honour it and keep the
 instruction large, so the code generated in this case is not as small
 as it could have been. This problem can be solved by defining `offset'
 before using it, or by forcing byte size in the effective address by
 coding `[byte ebx+offset]'.
 


 3.9. Local Labels
 


                                                                                        
    NASM gives special treatment to symbols beginning with a period. A
 label beginning with a single period is treated as a _local_ label,
 which means that it is associated with the previous non-local label.
 So, for example:                                                                                        
      label1  ; some code
                                                                                        
      .loop
              ; some more code
                                                                                        
              jne     .loop
              ret
                                                                                        
      label2  ; some code
                                                                                        
      .loop
              ; some more code
                                                                                        
              jne     .loop
              ret
                                                                                        
    In the above code fragment, each `JNE' instruction jumps to the line
 immediately before it, because the two definitions of `.loop' are kept
 separate by virtue of each being associated with the previous non-local
 label.
 


    This form of local label handling is borrowed from the old Amiga
 assembler DevPac; however, NASM goes one step further, in allowing
 access to local labels from other parts of the code. This is achieved
 by means of _defining_ a local label in terms of the previous non-local
 label: the first definition of `.loop' above is really defining a
 symbol called `label1.loop', and the second defines a symbol called
 `label2.loop'. So, if you really needed to, you could write
                                                                                        
      label3  ; some more code
              ; and some more
                                                                                        
              jmp label1.loop
                                                                                        
    Sometimes it is useful - in a macro, for instance - to be able to
 define a label which can be referenced from anywhere but which doesn't
 interfere with the normal local-label mechanism. Such a label can't be
 non-local because it would interfere with subsequent definitions of,
 and references to, local labels; and it can't be local because the
 macro that defined it wouldn't know the label's full name. NASM
 therefore introduces a third type of label, which is probably only
 useful in macro definitions: if a label begins with the special prefix
 `..@', then it does nothing to the local label mechanism. So you could
 code                                                                                        
      label1:                         ; a non-local label
      .local:                         ; this is really label1.local
      ..@foo:                         ; this is a special symbol
      label2:                         ; another non-local label
      .local:                         ; this is really label2.local
                                                                                        
              jmp     ..@foo          ; this will jump three lines up
 
 Chapter 4: The NASM Preprocessor
 


  
                                                                                       
    NASM contains a powerful macro processor, which supports conditional
 assembly, multi-level file inclusion, two forms of macro (single-line
 and multi-line), and a `context stack' mechanism for extra macro power.
 Preprocessor directives all begin with a `%' sign.
 

                                                                                       
    The preprocessor collapses all lines which end with a backslash (\)
 character into a single line. Thus:
                                                                                        
      %define THIS_VERY_LONG_MACRO_NAME_IS_DEFINED_TO \
              THIS_VALUE
                                                                                        
    will work like a single-line macro without the backslash-newline
 sequence.
 


 4.1.1. The Normal Way: `%define'
 


                                                                                        
    Single-line macros are defined using the `%define' preprocessor
 directive. The definitions work in a similar way to C; so you can do
 things like
                                                                                        
      %define ctrl    0x1F &
      %define param(a,b) ((a)+(a)*(b))
                                                                                        
              mov     byte [param(2,ebx)], ctrl 'D'
                                                                                        
    which will expand to
                                                                                        
              mov     byte [(2)+(2)*(ebx)], 0x1F & 'D'
                                                                                        
    When the expansion of a single-line macro contains tokens which
 invoke another macro, the expansion is performed at invocation time,
 not at definition time. Thus the code
                                                                                        
      %define a(x)    1+b(x)
      %define b(x)    2*x
                                                                                        
              mov     ax,a(8)
                                                                                        
    will evaluate in the expected way to `mov ax,1+2*8', even though the
 macro `b' wasn't defined at the time of definition of `a'.
    Macros defined with `%define' are case sensitive: after `%define foo
 bar', only `foo' will expand to `bar': `Foo' or `FOO' will not. By
 using `%idefin!"e' instead of `%define' (the `i' stands for
 `insensitive') you can define all the case variants of a macro at once,
 so that `%idefine foo bar' would cause `foo', `Foo', `FOO', `fOO' and
 so on all to expand to `bar'.
                                                                                        
    There is a mechanism which detects when a macro call has occurred as
 a result of a previous expansion of the same macro, to guard against
 circular references and infinite loops. If this happens, the
 preprocessor will only expand the first occurrence of the macro. Hence,
 if you code
                                                                                        
      %define a(x)    1+a(x)
                                                                                        
              mov     ax,a(3)
                                                                                        
    the macro `a(3)' will expand once, becoming `1+a(3)', and will then
 expand no further. This behaviour can be useful: see *Note Section 8.1::
 for an example of its use.
                                                                                        
    You can overload single-line macros: if you write
                                                                                        
      %define foo(x)   1+x
      %define foo(x,y) 1+x*y
                                                                                        
    the preprocessor will be able to handle both types of macro call, by
 counting the parameters you pass; so `foo(3)' will become `1+3' whereas
 `foo(ebx,2)' will become `1+ebx*2'. However, if you define
     %define foo bar
                                                                                        
    then no other definition of `foo' will be accepted: a macro with no
 parameters prohibits the definition of the same name as a macro _with_
 parameters, and vice versa.
                                                                                        
    This doesn't prevent single-line macros being _redefined_: you can
 perfectly well define a macro with
                                                                                        
      %define foo bar
                                                                                        
    and then re-define it later in the same source file with
                                                                                        
      %define foo baz
                                                                                        
    Then everywhere the macro `foo' is invoked, it will be expanded
 according to the most recent definition. This is particularly useful
 when defining single-line macros with `%assign' (see *Note Section
 4.1.5::).
 
 4.4. Conditional Assembly
 =========================
                                                                                        
    Similarly to the C preprocessor, NASM allows sections of a source
 file to be assembled only if certain conditions are met. The general
 syntax of this feature looks like this:
                                                                                        
      %if
          ; some code which only appears if  is met
      %elif
          ; only appears if  is not met but  is
      %else
          ; this appears if neither  nor  was met
      %endif
                                                                                        
    The `%else' clause is optional, as is the `%elif' clause. You can
 have more than one `%elif' clause as well.
 
 4.4.1. `%ifdef': Testing Single-Line Macro Existence
 ----------------------------------------------------
                                                                                        
    Beginning a conditional-assembly block with the line `%ifdef MACRO'
 will assemble the subsequent code if, and only if, a single-line macro
 called `MACRO' is defined. If not, then the `%elif' and `%else' blocks
 (if any) will be processed instead.
                                                                                        
    For example, when debugging a program, you might want to write code
 such as
                                                                                        
                ; perform some function
      %ifdef DEBUG
                writefile 2,"Function performed successfully",13,10
      %endif
                ; go and do something else
                                                                                        
    Then you could use the command-line option `-dDEBUG' to create a
 version of the program which produced debugging messages, and remove the
 option to generate the final release version of the program.
                                                                                        
    You can test for a macro _not_ being defined by using `%ifndef'
 instead of `%ifdef'. You can also test for macro definitions in `%elif'
 blocks by using `%elifdef' and `%elifndef'.
 
 4.4.2. `ifmacro': Testing Multi-Line Macro Existence
 ----------------------------------------------------
                                                                                        
    The `%ifmacro' directive operates in the same way as the `%ifdef'
 directive, except that it checks for the existence of a multi-line
 macro.
                                                                                        
    For example, you may be working with a large project and not have
 control over the macros in a library. You may want to create a macro
 with one name if it doesn't already exist, and another name if one with
 that name does exist.
                                                                                        
    The `%ifmacro' is considered true if defining a macro with the given
 name and number of arguments would cause a definitions conflict. For
 example:
                                                                                        
      %ifmacro MyMacro 1-3
                                                                                        
           %error "MyMacro 1-3" causes a conflict with an existing macro.
                                                                                        
      %else
                                                                                        
           %macro MyMacro 1-3
                                                                                        
                   ; insert code to define the macro
                                                                                        
           %endmacro
                                                                                        
      %endif
 
 4.4.4. `%if': Testing Arbitrary Numeric Expressions
 ---------------------------------------------------
                                                                                        
    The conditional-assembly construct `%if expr' will cause the
 subsequent code to be assembled if and only if the value of the numeric
 expression `expr' is non-zero. An example of the use of this feature is
 in deciding when to break out of a `%rep' preprocessor loop: see *Note
 Section 4.5:: for a detailed example.
                                                                                        
    The expression given to `%if', and its counterpart `%elif', is a
 critical expression (see *Note Section 3.8::).
                                                                                        
    `%if' extends the normal NASM expression syntax, by providing a set
 of relational operators which are not normally available in
 expressions. The operators `=', `<', `>', `<=', `>=' and `<>' test
 equality, less-than, greater-than, less-or-equal, greater-or-equal and
 not-equal respectively. The C-like forms `==' and `!=' are supported as
 alternative forms of `=' and `<>'. In addition, low- priority logical
 operators `&&', `^^' and `||' are provided, supplying logical AND,
 logical XOR and logical OR. These work like the C logical operators
 (although C has no logical XOR), in that they always return either 0 or
 1, and treat any non-zero input as 1 (so that `^^', for example,
 returns 1 if exactly one of its inputs is zero, and 0 otherwise). The
 relational operators also return 1 for true and 0 for false.
 
 4.5. Preprocessor Loops: `%rep'
 ===============================
                                                                                        
    NASM's `TIMES' prefix, though useful, cannot be used to invoke a
 multi-line macro multiple times, because it is processed by NASM after
 macros have already been expanded. Therefore NASM provides another form
 of loop, this time at the preprocessor level: `%rep'.
                                                                                        
    The directives `%rep' and `%endrep' (`%rep' takes a numeric
 argument, which can be an expression; `%endrep' takes no arguments) can
 be used to enclose a chunk of code, which is then replicated as many
 times as specified by the preprocessor:
                                                                                        
      %assign i 0
      %rep    64
              inc     word [table+2*i]
      %assign i i+1
      %endrep
                                                                                        
    This will generate a sequence of 64 `INC' instructions, incrementing
 every word of memory from `[table]' to `[table+126]'.
                                                                                        
    For more complex termination conditions, or to break out of a repeat
 loop part way along, you can use the `%exitrep' directive to terminate
 the loop, like this:
                                                                                        
      fibonacci:
      %assign i 0
      %assign j 1
      %rep 100
      %if j > 65535
          %exitrep
      %endif
              dw j
      %assign k j+i
      %assign i j
      %assign j k
      %endrep
                                                                                        
      fib_number equ ($-fibonacci)/2
                                                                                        
    This produces a list of all the Fibonacci numbers that will fit in
 16 bits.  Note that a maximum repeat count must still be given to
 `%rep'. This is to prevent the possibility of NASM getting into an
 infinite loop in the preprocessor, which (on multitasking or multi-user
 systems) would typically cause all the system memory to be gradually
 used up and other applications to start crashing.
 
 .6. Including Other Files
 ==========================
                                                                                        
    Using, once again, a very similar syntax to the C preprocessor,
 NASM's preprocessor lets you include other source files into your code.
 This is done by the use of the `%include' directive:
                                                                                        
      %include "macros.mac"
                                                                                        
    will include the contents of the file `macros.mac' into the source
 file containing the `%include' directive.
                                                                                        
    Include files are searched for in the current directory (the
 directory you're in when you run NASM, as opposed to the location of
 the NASM executable or the location of the source file), plus any
 directories specified on the NASM command line using the `-i' option.
                                                                                        
    The standard C idiom for preventing a file being included more than
 once is just as applicable in NASM: if the file `macros.mac' has the
 form
                                                                                        
      %ifndef MACROS_MAC
          %define MACROS_MAC
          ; now define some macros
      %endif
                                                                                        
    then including the file more than once will not cause errors,
 because the second time the file is included nothing will happen
 because the macro `MACROS_MAC' will already be defined.
 
 4.7. The Context Stack
 ======================
                                                                                        
    Having labels that are local to a macro definition is sometimes not
 quite powerful enough: sometimes you want to be able to share labels
 between several macro calls. An example might be a `REPEAT' ... `UNTIL'
 loop, in which the expansion of the `REPEAT' macro would need to be
 able to refer to a label which the `UNTIL' macro had defined. However,
 for such a macro you would also want to be able to nest these loops.
                                                                                        
    NASM provides this level of power by means of a _context stack_. The
 preprocessor maintains a stack of _contexts_, each of which is
 characterised by a name. You add a new context to the stack using the
 `%push' directive, and remove one using `%pop'. You can define labels
 that are local to a particular context on the stack.
 
 4.7.1. `%push' and `%pop': Creating and Removing Contexts
 ---------------------------------------------------------
                                                                                        
    The `%push' directive is used to create a new context and place it on
 the top of the context stack. `%push' requires one argument, which is
 the name of the context. For example:
                                                                                        
      %push    foobar
                                                                                        
    This pushes a new context called `foobar' on the stack. You can have
 several contexts on the stack with the same name: they can still be
 distinguished.
                                                                                        
    The directive `%pop', requiring no arguments, removes the top context
 from the context stack and destroys it, along with any labels associated
 with it.
 
 4.7.2. Context-Local Labels
 ---------------------------
                                                                                        
    Just as the usage `%%foo' defines a label which is local to the
 particular macro call in which it is used, the usage `%$foo' is used to
 define a label which is local to the context on the top of the context
 stack. So the `REPEAT' and `UNTIL' example given above could be
 implemented by means of:
                                                                                        
      %macro repeat 0
                                                                                        
          %push   repeat
          %$begin:
                                                                                        
      %endmacro
                                                                                        
      %macro until 1
                                                                                        
              j%-1    %$begin
          %pop
                                                                                        
      %endmacro
                                                                                        
    and invoked by means of, for example,
            mov     cx,string
              repeat
              add     cx,3
              scasb
              until   e
                                                                                        
    which would scan every fourth byte of a string in search of the byte
 in `AL'.
                                                                                        
    If you need to define, or access, labels local to the context _below_
 the top one on the stack, you can use `%$$foo', or `%$$$foo' for the
 context below that, and so on.
                                                                                        
 4.8. Standard Macros
 ====================
                                                                                        
    NASM defines a set of standard macros, which are already defined
 when it starts to process any source file. If you really need a program
 to be assembled with no pre-defined macros, you can use the `%clear'
 directive to empty the preprocessor of everything.
                                                                                        
    Most user-level assembler directives (see *Note Chapter 5::) are
 implemented as macros which invoke primitive directives; these are
 described in *Note Chapter 5::. The rest of the standard macro set is
 described here.
 
 4.8.1. `__NASM_MAJOR__', `__NASM_MINOR__', `__NASM_SUBMINOR__' 
 and `___NASM_PATCHLEVEL_
    The single-line macros `__NASM_MAJOR__', `__NASM_MINOR__',
 `__NASM_SUBMINOR__' and `___NASM_PATCHLEVEL__' expand to the major,
 minor, subminor and patch level parts of the version number of NASM
 being used. So, under NASM 0.98.32p1 for example, `__NASM_MAJOR__'
 would be defined to be 0, `__NASM_MINOR__' would be defined as 98,
 `__NASM_SUBMINOR__' would be defined to 32, and `___NASM_PATCHLEVEL__'
 would be defined as 1.
 
 4.8.7. `ALIGN' and `ALIGNB': Data Alignment
 -------------------------------------------
                                                                                        
    The `ALIGN' and `ALIGNB' macros provides a convenient way to align
 code or data on a word, longword, paragraph or other boundary. (Some
 assemblers call this directive `EVEN'.) The syntax of the `ALIGN' and
 `ALIGNB' macros is
                                                                                        
              align   4               ; align on 4-byte boundary
              align   16              ; align on 16-byte boundary
              align   8,db 0          ; pad with 0s rather than NOPs
              align   4,resb 1        ; align to 4 in the BSS
              alignb  4               ; equivalent to previous line
                                                                                        
    Both macros require their first argument to be a power of two; they
 both compute the number of additional bytes required to bring the
 length of the current section up to a multiple of that power of two,
 and then apply the `TIMES' prefix to their second argument to perform
 the alignment.
                                                                                        
    If the second argument is not specified, the default for `ALIGN' is
 `NOP', and the default for `ALIGNB' is `RESB 1'. So if the second
 argument is specified, the two macros are equivalent. Normally, you can
 just use `ALIGN' in code and data sections and `ALIGNB' in BSS
 sections, and never need the second argument except for special
 purposes.
 
    `ALIGN' and `ALIGNB', being simple macros, perform no error
 checking: they cannot warn you if their first argument fails to be a
 power of two, or if their second argument generates more than one byte
 of code.  In each of these cases they will silently do the wrong thing.
                                                                                        
    `ALIGNB' (or `ALIGN' with a second argument of `RESB 1') can be used
 within structure definitions:
                                                                                        
      struc mytype2
                                                                                        
        mt_byte:
              resb 1
              alignb 2
        mt_word:
              resw 1
              alignb 4
        mt_long:
              resd 1
        mt_str:
              resb 32
                                                                                        
      endstruc
                                                                                        
    This will ensure that the structure members are sensibly aligned
 relative to the base of the structure.
 
    A final caveat: `ALIGN' and `ALIGNB' work relative to the beginning
 of the _section_, not the beginning of the address space in the final
 executable. Aligning to a 16-byte boundary when the section you're in
 is only guaranteed to be aligned to a 4-byte boundary, for example, is
 a waste of effort. Again, NASM does not check that the section's
 alignment characteristics are sensible for the use of `ALIGN' or
 `ALIGNB'.
 
 4.10. Other Preprocessor Directives
 ===================================
                                                                                        
    NASM also has preprocessor directives which allow access to
 information from external sources. Currently they include:
                                                                                        
    The following preprocessor directive is supported to allow NASM to
 correctly handle output of the cpp C language preprocessor.
                                                                                        
    * `%line' enables NAsM to correctly handle the output of the cpp C
      language preprocessor (see *Note Section 4.10.1::).
                                                                                        
    * `%!' enables NASM to read in the value of an environment variable,
      which can then be used in your program (see *Note Section
      4.10.2::).
 
 Chapter 5: Assembler Directives
 *******************************
                                                                                        
    NASM, though it attempts to avoid the bureaucracy of assemblers like
 MASM and TASM, is nevertheless forced to support a _few_ directives.
 These are described in this chapter.
                                                                                        
    NASM's directives come in two types: _user-level_ directives and
 _primitive_ directives. Typically, each directive has a user-level form
 and a primitive form. In almost all cases, we recommend that users use
 the user-level forms of the directives, which are implemented as macros
 which call the primitive forms.
                                                                                        
    Primitive directives are enclosed in square brackets; user-level
 directives are not.
                                                                                        
    In addition to the universal directives described in this chapter,
 each object file format can optionally supply extra directives in order
 to control particular features of that file format. These
 _format-specific_ directives are documented along with the formats that
 implement them, in *Note Chapter 6::.
 
 5.1. `BITS': Specifying Target Processor Mode
 =============================================
                                                                                        
    The `BITS' directive specifies whether NASM should generate code
 designed to run on a processor operating in 16-bit mode, or code
 designed to run on a processor operating in 32-bit mode. The syntax is
 `BITS 16' or `BITS 32'.
                                                                                        
    In most cases, you should not need to use `BITS' explicitly. The
 `aout', `coff', `elf' and `win32' object formats, which are designed
 for use in 32-bit operating systems, all cause NASM to select 32-bit
 mode by default. The `obj' object format allows you to specify each
 segment you define as either `USE16' or `USE32', and NASM will set its
 operating mode accordingly, so the use of the `BITS' directive is once
 again unnecessary.
                                                                                        
    The most likely reason for using the `BITS' directive is to write 32-
 bit code in a flat binary file; this is because the `bin' output format
 defaults to 16-bit mode in anticipation of it being used most
 frequently to write DOS `.COM' programs, DOS `.SYS' device drivers and
 boot loader software.
                                                                                        
    You do _not_ need to specify `BITS 32' merely in order to use 32-
 bit instructions in a 16-bit DOS program; if you do, the assembler will
 generate incorrect code because it will be writing code targeted at a
 32- bit platform, to be run on a 16-bit one.
    When NASM is in `BITS 16' state, instructions which use 32-bit data
 are prefixed with an 0x66 byte, and those referring to 32-bit addresses
 have an 0x67 prefix. In `BITS 32' state, the reverse is true: 32-bit
 instructions require no prefixes, whereas instructions using 16-bit data
 need an 0x66 and those working on 16-bit addresses need an 0x67.
                                                                                        
    The `BITS' directive has an exactly equivalent primitive form,
 `[BITS 16]' and `[BITS 32]'. The user-level form is a macro which has
 no function other than to call the primitive form.
                                                                                        
    Note that the space is neccessary, `BITS32' will _not_ work!
 
 5.2. `SECTION' or `SEGMENT': Changing and Defining Sections
 ===========================================================
                                                                                        
    The `SECTION' directive (`SEGMENT' is an exactly equivalent synonym)
 changes which section of the output file the code you write will be
 assembled into. In some object file formats, the number and names of
 sections are fixed; in others, the user may make up as many as they
 wish.  Hence `SECTION' may sometimes give an error message, or may
 define a new section, if you try to switch to a section that does not
 (yet) exist.
                                                                                        
    The Unix object formats, and the `bin' object format (but see *Note
 Section 6.1.3::, all support the standardised section names `.text',
 `.data' and `.bss' for the code, data and uninitialised-data sections.
 The `obj' format, by contrast, does not recognise these section names
 as being special, and indeed will strip off the leading period of any
 section name that has one.
                                                                                        
 5.4. `EXTERN': Importing Symbols from Other Modules
 ===================================================
                                                                                        
    `EXTERN' is similar to the MASM directive `EXTRN' and the C keyword
 `extern': it is used to declare a symbol which is not defined anywhere
 in the module being assembled, but is assumed to be defined in some
 other module and needs to be referred to by this one. Not every
 object-file format can support external variables: the `bin' format
 cannot.
                                                                                        
    The `EXTERN' directive takes as many arguments as you like. Each
 argument is the name of a symbol:
                                                                                        
      extern  _printf
      extern  _sscanf,_fscanf
                                                                                        
    Some object-file formats provide extra features to the `EXTERN'
 directive. In all cases, the extra features are used by suffixing a
 colon to the symbol name followed by object-format specific text. For
 example, the `obj' format allows you to declare that the default
 segment base of an external should be the group `dgroup' by means of
 the directive
                                                                                        
      extern  _variable:wrt dgroup
                                                                                        
    The primitive form of `EXTERN' differs from the user-level form only
 in that it can take only one argument at a time: the support for
 multiple arguments is implemented at the preprocessor level.
 
 5.5. `GLOBAL': Exporting Symbols to Other Modules
 =================================================
                                                                                        
    `GLOBAL' is the other end of `EXTERN': if one module declares a
 symbol as `EXTERN' and refers to it, then in order to prevent linker
 errors, some other module must actually _define_ the symbol and declare
 it as `GLOBAL'. Some assemblers use the name `PUBLIC' for this purpose.
                                                                                        
    The `GLOBAL' directive applying to a symbol must appear _before_ the
 definition of the symbol.
                                                                                        
    `GLOBAL' uses the same syntax as `EXTERN', except that it must refer
 to symbols which _are_ defined in the same module as the `GLOBAL'
 directive. For example:
                                                                                        
      global _main
      _main:
              ; some code
                                                                                        
    `GLOBAL', like `EXTERN', allows object formats to define private
 extensions by means of a colon. The `elf' object format, for example,
 lets you specify whether global data items are functions or data:
                                                                                        
      global  hashlookup:function, hashtable:data
                                                                                        
    Like `EXTERN', the primitive form of `GLOBAL' differs from the
 user-level form only in that it can take only one argument at a time.
 
 5.6. `COMMON': Defining Common Data Areas
 =========================================
                                                                                        
    The `COMMON' directive is used to declare _common variables_. A
 common variable is much like a global variable declared in the
 uninitialised data section, so that
                                                                                        
      common  intvar  4
                                                                                        
    is similar in function to
                                                                                        
      global  intvar
      section .bss
                                                                                        
      intvar  resd    1
                                                                                        
    The difference is that if more than one module defines the same
 common variable, then at link time those variables will be _merged_, and
 references to `intvar' in all modules will point at the same piece of
 memory.
                                                                                        
    Like `GLOBAL' and `EXTERN', `COMMON' supports object-format specific
 extensions. For example, the `obj' format allows common variables to be
 NEAR or FAR, and the `elf' format allows you to specify the alignment
 requirements of a common variable:
                                                                                        
      common  commvar  4:near  ; works in OBJ
      common  intarray 100:4   ; works in ELF: 4 byte aligned
 
 Chapter 6: Output Formats
 *************************
                                                                                        
    NASM is a portable assembler, designed to be able to compile on any
 ANSI C- supporting platform and produce output to run on a variety of
 Intel x86 operating systems. For this reason, it has a large number of
 available output formats, selected using the `-f' option on the NASM
 command line. Each of these formats, along with its extensions to the
 base NASM syntax, is detailed in this chapter.
                                                                                        
    As stated in *Note Section 2.1.1::, NASM chooses a default name for
 your output file based on the input file name and the chosen output
 format. This will be generated by removing the extension (`.asm', `.s',
 or whatever you like to use) from the input file name, and substituting
 an extension defined by the output format. The extensions are given
 with each format below.
 
 6.1. `bin': Flat-Form Binary Output
 ===================================
                                                                                        
    The `bin' format does not produce object files: it generates nothing
 in the output file except the code you wrote. Such `pure binary' files
 are used by MS-DOS: `.COM' executables and `.SYS' device drivers are
 pure binary files. Pure binary output is also useful for operating
 system and boot loader development.
                                                                                        
    The `bin' format supports multiple section names. For details of how
 nasm handles sections in the `bin' format, see *Note Section 6.1.3::.
                                                                                        
    Using the `bin' format puts NASM by default into 16-bit mode (see
 *Note Section 5.1::). In order to use `bin' to write 32-bit code such as
 an OS kernel, you need to explicitly issue the `BITS 32' directive.
                                                                                        
    `bin' has no default output file name extension: instead, it leaves
 your file name as it is once the original extension has been removed.
 Thus, the default is for NASM to assemble `binprog.asm' into a binary
 file called `binprog'.
 
 6.1.1. `ORG': Binary File Program Origin
 ----------------------------------------
                                                                                        
    The `bin' format provides an additional directive to the list given
 in *Note Chapter 5::: `ORG'. The function of the `ORG' directive is to
 specify the origin address which NASM will assume the program begins at
 when it is loaded into memory.
                                                                                        
    For example, the following code will generate the longword
 `0x00000104':
                                                                                        
              org     0x100
              dd      label
      label:
                                                                                        
    Unlike the `ORG' directive provided by MASM-compatible assemblers,
 which allows you to jump around in the object file and overwrite code
 you have already generated, NASM's `ORG' does exactly what the directive
 says: _origin_. Its sole function is to specify one offset which is
 added to all internal address references within the section; it does not
 permit any of the trickery that MASM's version does. See *Note Section
 10.1.3:: for further comments.
 
 6.5. `elf': Executable and Linkable Format Object Files
 =======================================================
                                                                                        
    The `elf' output format generates `ELF32' (Executable and Linkable
 Format) object files, as used by Linux as well as Unix System V,
 including Solaris x86, UnixWare and SCO Unix. `elf' provides a default
 output file-name extension of `.o'.
 
 6.5.3. `elf' Extensions to the `GLOBAL' Directive
 -------------------------------------------------
                                                                                        
    `ELF' object files can contain more information about a global symbol
 than just its address: they can contain the size of the symbol and its
 type as well. These are not merely debugger conveniences, but are
 actually necessary when the program being written is a shared library.
 NASM therefore supports some extensions to the `GLOBAL' directive,
 allowing you to specify these features.
                                                                                        
    You can specify whether a global variable is a function or a data
 object by suffixing the name with a colon and the word `function' or
 `data'. (`object' is a synonym for `data'.) For example:
                                                                                        
      global   hashlookup:function, hashtable:data
                                                                                        
    exports the global symbol `hashlookup' as a function and `hashtable'
 as a data object.
                                                                                        
    You can also specify the size of the data associated with the
 symbol, as a numeric expression (which may involve labels, and even
 forward references) after the type specifier. Like this:
                                                                                        
      global  hashtable:data (hashtable.end - hashtable)
                                                                                        
      hashtable:
              db this,that,theother  ; some data here
      .end:
 
 6.8. `as86': Minix/Linux `as86' Object Files
 ============================================
                                                                                        
    The Minix/Linux 16-bit assembler `as86' has its own non-standard
 object file format. Although its companion linker `ld86' produces
 something close to ordinary `a.out' binaries as output, the object file
 format used to communicate between `as86' and `ld86' is not itself
 `a.out'.
                                                                                        
    NASM supports this format, just in case it is useful, as `as86'.
 `as86' provides a default output file-name extension of `.o'.
                                                                                        
    `as86' is a very simple object format (from the NASM user's point of
 view). It supports no special directives, no special symbols, no use of
 `SEG' or `WRT', and no extensions to any standard directives. It
 supports only the three standard section names `.text', `.data' and
 `.bss'.
 
 8.2. Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF Shared Libraries
 ==================================================================
                                                                                        
    `ELF' replaced the older `a.out' object file format under Linux
 because it contains support for position-independent code (PIC), which
 makes writing shared libraries much easier. NASM supports the `ELF'
 position-independent code features, so you can write Linux `ELF' shared
 libraries in NASM.
                                                                                        
    NetBSD, and its close cousins FreeBSD and OpenBSD, take a different
 approach by hacking PIC support into the `a.out' format. NASM supports
 this as the `aoutb' output format, so you can write BSD shared
 libraries in NASM too.
                                                                                        
    The operating system loads a PIC shared library by memory-mapping the
 library file at an arbitrarily chosen point in the address space of the
 running process. The contents of the library's code section must
 therefore not depend on where it is loaded in memory.
                                                                                        
    Therefore, you cannot get at your variables by writing code like
 this:
                                                                                        
              mov     eax,[myvar]             ; WRONG
    Instead, the linker provides an area of memory called the _global
 offset table_, or GOT; the GOT is situated at a constant distance from
 your library's code, so if you can find out where your library is
 loaded (which is typically done using a `CALL' and `POP' combination),
 you can obtain the address of the GOT, and you can then load the
 addresses of your variables out of linker-generated entries in the GOT.
                                                                                        
    The _data_ section of a PIC shared library does not have these
 restrictions: since the data section is writable, it has to be copied
 into memory anyway rather than just paged in from the library file, so
 as long as it's being copied it can be relocated too. So you can put
 ordinary types of relocation in the data section without too much worry
 (but see *Note Section 8.2.4:: for a caveat).
 
 8.2.1. Obtaining the Address of the GOT
 ---------------------------------------
                                                                                        
    Each code module in your shared library should define the GOT as an
 external symbol:
                                                                                        
      extern  _GLOBAL_OFFSET_TABLE_   ; in ELF
      extern  __GLOBAL_OFFSET_TABLE_  ; in BSD a.out
                                                                                        
    At the beginning of any function in your shared library which plans
 to access your data or BSS sections, you must first calculate the
 address of the GOT. This is typically done by writing the function in
 this form:
                                                                                        
      func:   push    ebp
              mov     ebp,esp
              push    ebx
              call    .get_GOT
      .get_GOT:
              pop     ebx
              add     ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc
                                                                                        
              ; the function body comes here
                                                                                        
              mov     ebx,[ebp-4]
              mov     esp,ebp
              pop     ebp
              ret
                                                                                        
    (For BSD, again, the symbol `_GLOBAL_OFFSET_TABLE' requires a second
 leading underscore.)
                                                                                        
    The first two lines of this function are simply the standard C
 prologue to set up a stack frame, and the last three lines are standard
 C function epilogue. The third line, and the fourth to last line, save
 and restore the `EBX' register, because PIC shared libraries use this
 register to store the address of the GOT.
                                                                                        
    The interesting bit is the `CALL' instruction and the following two
 lines. The `CALL' and `POP' combination obtains the address of the
 label `.get_GOT', without having to know in advance where the program
 was loaded (since the `CALL' instruction is encoded relative to the
 current position). The `ADD' instruction makes use of one of the
 special PIC relocation types: GOTPC relocation. With the `WRT ..gotpc'
 qualifier specified, the symbol referenced (here
 `_GLOBAL_OFFSET_TABLE_', the special symbol assigned to the GOT) is
 given as an offset from the beginning of the section. (Actually, `ELF'
 encodes it as the offset from the operand field of the `ADD'
 instruction, but NASM simplifies this deliberately, so you do things the
 same way for both `ELF' and `BSD'.) So the instruction then _adds_ the
 beginning of the section, to get the real address of the GOT, and
 subtracts the value of `.get_GOT' which it knows is in `EBX'.
 Therefore, by the time that instruction has finished, `EBX' contains
 the address of the GOT.
   If you didn't follow that, don't worry: it's never necessary to
 obtain the address of the GOT by any other means, so you can put those
 three instructions into a macro and safely ignore them:
                                                                                        
      %macro  get_GOT 0
                                                                                        
              call    %%getgot
        %%getgot:
              pop     ebx
              add     ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc
                                                                                        
      %endmacro
 
 8.2.4. Exporting Symbols to the Library User
 --------------------------------------------
                                                                                        
    If you want to export symbols to the user of the library, you have to
 declare whether they are functions or data, and if they are data, you
 have to give the size of the data item. This is because the dynamic
 linker has to build procedure linkage table entries for any exported
 functions, and also moves exported data items away from the library's
 data section in which they were declared.
                                                                                        
    So to export a function to users of the library, you must use
                                                                                        
      global  func:function           ; declare it as a function
                                                                                        
      func:   push    ebp
                                                                                        
              ; etc.
                                                                                        
    And to export a data item such as an array, you would have to code
                                                                                        
      global  array:data array.end-array      ; give the size too
                                                                                        
      array:  resd    128
      .end:
    Be careful: If you export a variable to the library user, by
 declaring it as `GLOBAL' and supplying a size, the variable will end up
 living in the data section of the main program, rather than in your
 library's data section, where you declared it. So you will have to
 access your own global variable with the `..got' mechanism rather than
 `..gotoff', as if it were external (which, effectively, it has become).
                                                                                        
    Equally, if you need to store the address of an exported global in
 one of your data sections, you can't do it by means of the standard
 sort of code:
                                                                                        
      dataptr:        dd      global_data_item        ; WRONG
                                                                                        
    NASM will interpret this code as an ordinary relocation, in which
 `global_data_item' is merely an offset from the beginning of the
 `.data' section (or whatever); so this reference will end up pointing
 at your data section instead of at the exported global which resides
 elsewhere.
                                                                                        
    Instead of the above code, then, you must write
                                                                                        
      dataptr:        dd      global_data_item wrt ..sym
                                                                                        
    which makes use of the special `WRT' type `..sym' to instruct NASM
 to search the symbol table for a particular symbol at that address,
 rather than just relocating by section base.
                                                                                        
    Either method will work for functions: referring to one of your
 functions by means of
      funcptr:        dd      my_function
                                                                                        
    will give the user the address of the code you wrote, whereas
                                                                                        
      funcptr:        dd      my_function wrt .sym
                                                                                        
    will give the address of the procedure linkage table for the
 function, which is where the calling program will _believe_ the
 function lives.  Either address is a valid way to call the function.
 
 10.1.1. NASM Generates Inefficient Code
 ---------------------------------------
                                                                                        
    We sometimes get `bug' reports about NASM generating inefficient, or
 even `wrong', code on instructions such as `ADD ESP,8'. This is a
 deliberate design feature, connected to predictability of output: NASM,
 on seeing `ADD ESP,8', will generate the form of the instruction which
 leaves room for a 32-bit offset. You need to code `ADD ESP,BYTE 8' if
 you want the space-efficient form of the instruction. This isn't a bug,
 it's user error: if you prefer to have NASM produce the more efficient
 code automatically enable optimization with the `-On' option (see *Note
 Section 2.1.16::).
 
 10.1.2. My Jumps are Out of Range
 ---------------------------------
                                                                                        
    Similarly, people complain that when they issue conditional jumps
 (which are `SHORT' by default) that try to jump too far, NASM reports
 `short jump out of range' instead of making the jumps longer.
                                                                                        
    This, again, is partly a predictability issue, but in fact has a more
 practical reason as well. NASM has no means of being told what type of
 processor the code it is generating will be run on; so it cannot decide
 for itself that it should generate `Jcc NEAR' type instructions, because
 it doesn't know that it's working for a 386 or above. Alternatively, it
 could replace the out-of-range short `JNE' instruction with a very
 short `JE' instruction that jumps over a `JMP NEAR'; this is a sensible
 solution for processors below a 386, but hardly efficient on processors
 which have good branch prediction _and_ could have used `JNE NEAR'
 instead. So, once again, it's up to the user, not the assembler, to
 decide what instructions should be generated. See *Note Section
 2.1.16::.
 
 10.1.3. `ORG' Doesn't Work
 --------------------------
                                                                                        
    People writing boot sector programs in the `bin' format often
 complain that `ORG' doesn't work the way they'd like: in order to place
 the `0xAA55' signature word at the end of a 512-byte boot sector, people
 who are used to MASM tend to code
                                                                                        
              ORG 0
                                                                                        
              ; some boot sector code
                                                                                        
              ORG 510
              DW 0xAA55
                                                                                        
    This is not the intended use of the `ORG' directive in NASM, and will
 not work. The correct way to solve this problem in NASM is to use the
 `TIMES' directive, like this:
                                                                                        
              ORG 0
                                                                                        
              ; some boot sector code
                                                                                        
              TIMES 510-($-$$)
              DW 0xAA55
                                                                                        
    The `TIMES' directive will insert exactly enough zero bytes into the
 output to move the assembly point up to 510. This method also has the
 advantage that if you accidentally fill your boot sector too full, NASM
 will catch the problem at assembly time and report it, so you won't end
 up with a boot sector that you have to disassemble to find out what's
 wrong with it.
 
  Полный архив NASM info можно взять тут (180 кб) .
 
 
 
 
 
 
 	
 Оставьте свой комментарий !

Ваше имя:
Комментарий:
Оба поля являются обязательными 

 Автор   Комментарий к данной статье