I/O Access using Inline Assembly
by Chase
System Requirments
Минимальные требования:
* | Intel compatible 386 or greater cpu |
* | DOS (для некоторых примеров) |
Introduction
Показано , как с помощью inline assembly получить доступ к hardware I/O из си .
Использован AT&T синтаксис .
The Basics
Пример базового доступа к I/O :
// --------------------------------------------------------------------------
// From NOPE-OS
// --------------------------------------------------------------------------
// arminb@aundb-online.de
// --------------------------------------------------------------------------
/* получить байт из порта*/
inline unsigned char inportb(unsigned int port)
{
unsigned char ret;
asm volatile ("inb %%dx,%%al":"=a" (ret):"d" (port));
return ret;
}
/* записать байт в порт */
/* July 6, 2001 added space between :: to make code compatible with gpp */
inline void outportb(unsigned int port,unsigned char value)
{
asm volatile ("outb %%al,%%dx": :"d" (port), "a" (value));
}
Памятка
Код читает и пишет байты . Если вам нужны слова или двойные слова ,
нужно изменить размер регистров (al, ax, eax) и возвращаемые аргументы .
На момент использования функции девайс должен быть задисэблен .
Пример
Следующий пример показывает как прочитать из CMOS's extended memory.
Пример работает из-под DOS.
#include <stdio.h>
/* Input a byte from a port */
inline unsigned char inportb(unsigned int port)
{
unsigned char ret;
asm volatile ("inb %%dx,%%al":"=a" (ret):"d" (port));
return ret;
}
/* Output a byte to a port */
/* July 6, 2001 added space between :: to make code compatible with gpp */
inline void outportb(unsigned int port,unsigned char value)
{
asm volatile ("outb %%al,%%dx": :"d" (port), "a" (value));
}
/* Stop Interrupts */
inline void stopints()
{
asm ("cli");
}
unsigned char highmem, lowmem;
unsigned int mem;
int main()
{
/* need to stop ints before accessing the CMOS chip */
stopints();
/* write to port 0x70 with the CMOS register we want to read */
/* 0x30 is the CMOS reg that hold the low byte of the mem count */
outportb(0x70,0x30);
/* read CMOS values from port 0x71 */
lowmem = inportb(0x71);
/* write to port 0x70 with the CMOS register we want to read */
/* 0x31 is the CMOS reg that hold the high byte of the mem count */
outportb(0x70,0x31);
/* read CMOS values from port 0x71 */
highmem = inportb(0x71);
/* fix the low and high bytes into one value */
mem = highmem;
mem = mem<<8;
mem += lowmem;
printf("\nOld style CMOS extended memory count is %uk.\n", mem);
}
Ссылки
Mixing Assembly and C
by Chase
About This Guide
Здесь рассказывается о том , как использовать C и Assembly
для создания операционных систем на x86.
System Requirments
* | Intel compatible 386 or greater cpu |
* | DOS compatible boot disk |
GCC/DJGPP
Для начала создадим простой текстовой файл :
int main(void)
{
repeat:
goto repeat;
}
Сохраним его как "kernel32.c" . Выполним компиляцию :
gcc -ffreestanding -c -o kernel32.o kernel32.c
Линковка :
ld -Ttext 0x100000 --oformat binary -o kernel32.bin kernel32.o
Должно вылететь предупреждение "warning: cannot find entry symbol start; defaulting to 00100000" .
gcc компилит исходник в обьектный файл . ld линкует обьектный файл в бинарный , который
должен загрузиться по адресу 0x100000 (адрес 1-го мегабайта , куда будет положено ядро).
Обратите внимание - мы могли бы использовать одну команду gcc без ld , и компилятор сам бы
вызвал линковщик неявно , но мы делаем по-другому . Дело в том , что мы собираемся
слинковать несколько обьектных файлов . Список параметров :
* | -ffreestanding -- генерация кода , которому не нужна никакая ось (т.е.:kernel code) |
* | -c -- компиляция , но не линковка (создание обьектного файла) |
* | -o -- задание имени обьектного файла |
Список параметров для ld :
* | --oformat -- формат binary |
* | -Ttext -- адрес , с которого загружается код |
* | -o -- имя создаваемого файла |
For complete desriptions of all command options you can look at the online manuals for gcc and ld.
NASM
NASM which stands for Netwide Assembler is "a free portable assembler
for the Intel 80x86 microprocessor" . You can find out where to
download NASM at the NASM website.
If you're running a type of unix look in the pakage collection of your
distribution for a copy of NASM. NASM, just like DJGPP, works via the
command line. This means that you write your code in any program that
can produce text files and then compile from a prompt. If you
downloaded the Windows version you may want to rename "nasmw.exe" to
"nasm.exe" . The first thing we'll be doing is to make an assembly code
kernel that does the same thing as our C kernel, nothing.
Create a text file that contains the following code:
[BITS 32]
repeat: jmp repeat
Save as "kernel32.asm" . From the same location as the file enter the command
nasm -f coff -o kernel32.o kernel32.asm
If you recieved no errors then enter the command
ld -Ttext 0x100000 --oformat binary -o kernel32.bin kernel32.o
You should get a warning that says "warning: cannot find entry symbol start; defaulting to 00100000" .
Copy the new kernel32.bin over to the same location as the old
kernel32.bin . Try loading your new kernel just like the old one. When
you run the loader your system should hang just like the C kernel did.
To make it easier to test our kernels it's going to be alot easier if
we can just return to DOS after running them. After all having to hit
reset after testing each kernel gets old very fast. To accomplish this
you need to know a little about how the loader works. The loader reads
"kernel32.bin" into memory and places it at the first megabyte of
memory. Then the loader sets up all selectors to access the first four
megabytes of memory and executes a far call to the first instruction at
0x100000. So to return to the loader from the kernel all we have to do
is execute a far return. The loader then reenables interupts, frees any
memory it used, and returns to DOS.
Create a text file that contains the following code:
[BITS 32]
retf
Save as "kernel32.asm" . From the same location as the file enter the command
nasm -f coff -o kernel32.o kernel32.asm
If you recieved no errors then enter the command
ld -Ttext 0x100000 --oformat binary -o kernel32.bin kernel32.o
You should get a warning that says "warning: cannot find entry symbol start; defaulting to 00100000" .
Try loading your new kernel just like the old one. When you run the
loader you should be returned back to a DOS prompt. Be sure not to mess
up the stack in your kernels, otherwise the far return won't work and
anything could happen. Here is a list of what each of the paramaters
we'll be using means for nasm :
* | -f -- specify the output format, we'll be using coff (coff is a type of object file) |
* | -o -- specify the name of the file that is created |
To display a "Hi" message just like the sample kernel, make a kernel that contains the following code:
[BITS 32]
mov byte [es:0xb8f9c],'H'
mov byte [es:0xb8f9e],'i'
retf
Since es points to a selector who's base addess is zero and the color
text area starts at 0xb8000 the letters H and i are displayed near the
end of a standard 80x25 text display. We'll discuss display adapters in
a video article(hopefully), for now all you need to know is that to
write a character to the display you just copy it's ASCII value to
0xb8000 to get it to show up in the upper left corner. To display a
character in any other location just add 2 to 0xb8000 for evey place to
the right. Text wraps down to the start of the next row when you reach
the end of the column.
Mixing C and Assembly
Most of the following text is taken directly from the nasm docs
External Symbol Names
Most 32-bit C compilers share the convention used by 16-bit compilers,
that the names of all global symbols (functions or data) they define
are formed by prefixing an underscore to the name as it appears in the
C program. However, not all of them do: the ELF specification states
that C symbols do not have a leading underscore on their
assembly-language names.
Function Definitions and Function Calls
The C calling convention in 32-bit programs is as follows. In the
following description, the words caller and callee are used to denote
the function doing the calling and the function which gets called.
* | The caller
pushes the function's parameters on the stack, one after another, in
reverse order (right to left, so that the first argument specified to
the function is pushed last). |
* | The caller then executes a near CALL instruction to pass control to the callee. |
* | The callee
receives control, and typically (although this is not actually
necessary, in functions which do not need to access their parameters)
starts by saving the value of ESP in EBP so as to be able to use EBP as
a base pointer to find its parameters on the stack. However, the caller
was probably doing this too, so part of the calling convention states
that EBP must be preserved by any C function. Hence the callee, if it
is going to set up EBP as a frame pointer, must push the previous value
first. |
* | Update: 1-21-2001 GCC based compilers also expect EBX EDI and ESI to be preserved by any function. |
* | The callee may
then access its parameters relative to EBP . The doubleword at [EBP]
holds the previous value of EBP as it was pushed; the next doubleword,
at [EBP+4] , holds the return address, pushed implicitly by CALL . The
parameters start after that, at [EBP+8] . The leftmost parameter of the
function, since it was pushed last, is accessible at this offset from
EBP ; the others follow, at successively greater offsets. Thus, in a
function such as printf() which takes a variable number of parameters,
the pushing of the parameters in reverse order means that the function
knows where to find its first parameter, which tells it the number and
type of the remaining ones. |
* | The callee may
also wish to decrease ESP further, so as to allocate space on the stack
for local variables, which will then be accessible at negative offsets
from EBP . |
* | The callee, if
it wishes to return a value to the caller, should leave the value in AL
, AX or EAX depending on the size of the value. Floating-point results
are typically returned in ST0 . |
* | Once the callee
has finished processing, it restores ESP from EBP if it had allocated
local stack space, then pops the previous value of EBP , and returns
via RET (equivalently, RETN ). |
* | When the caller
regains control from the callee, the function parameters are still on
the stack, so it typically adds an immediate constant to ESP to remove
them (instead of executing a number of slow POP instructions). Thus, if
a function is accidentally called with the wrong number of parameters
due to a prototype mismatch, the stack will still be returned to a
sensible state since the caller, which knows how many parameters it
pushed, does the removing. |
Thus, you would define a function in C style in the following way:
global _myfunc
_myfunc: push ebp
mov ebp,esp
sub esp,0x40 ; 64 bytes of local stack space
mov ebx,[ebp+8] ; first parameter to function
; some more code
leave ; mov esp,ebp / pop ebp
ret
At the other end of the process, to call a C function from your assembly code, you would do something like this:
extern _printf
; and then, further down...
push dword [myint] ; one of my integer variables
push dword mystring ; pointer into my data segment
call _printf
add esp,byte 8 ; `byte' saves space
; then those data items...
segment _DATA
myint dd 1234
mystring db 'This number -> %d <- should be 1234',10,0
This piece of code is the assembly equivalent of the C code
int myint = 1234;
printf("This number -> %d <- should be 1234\n", myint);
Accessing Data Items
To get at the contents of C variables, or to declare variables which C
can access, you need only declare the names as GLOBAL or EXTERN .
(Again, the names require leading underscores.) Thus, a C variable
declared as int i can be accessed from assembler as
extern _i
mov eax,[_i]
And to declare your own integer variable which C programs can access as
extern int j , you do this (making sure you are assembling in the _DATA
segment, if necessary):
global _j
_j dd 0
To access a C array, you need to know the size of the components of the
array. For example, int variables are four bytes long, so if a C
program declares an array as int a[10] , you can access a[3] by coding
mov ax,[_a+12] . (The byte offset 12 is obtained by multiplying the
desired array index, 3, by the size of the array element, 4.) The sizes
of the C base types in 32-bit compilers are: 1 for char , 2 for short ,
4 for int , long and float , and 8 for double . Pointers, being 32-bit
addresses, are also 4 bytes long.
To access a C data structure, you need to know the offset from the base
of the structure to the field you are interested in. You can either do
this by converting the C structure definition into a NASM structure
definition (using STRUC ), or by calculating the one offset and using
just that.
To do either of these, you should read your C compiler's manual to find
out how it organises data structures. NASM gives no special alignment
to structure members in its own STRUC macro, so you have to specify
alignment yourself if the C compiler generates it. Typically, you might
find that a structure like
struct {
char c;
int i;
} foo;
might be eight bytes long rather than five, since the int field would
be aligned to a four-byte boundary. However, this sort of feature is
sometimes a configurable option in the C compiler, either using
command-line options or #pragma lines, so you have to find out how your
own compiler does it.
Helper Macros for the 32-bit C Interface
If you find the underscores inconvenient, you can define macros to replace the GLOBAL and EXTERN directives as follows:
%macro cglobal 1
global _%1
%define %1 _%1
%endmacro
%macro cextern 1
extern _%1
%define %1 _%1
%endmacro
(These forms of the macros only take one argument at a time; a %rep construct could solve this.)
If you then declare an external like this:
cextern printf
then the macro will expand it as
extern _printf
%define printf _printf
Thereafter, you can reference printf as if it was a symbol, and the
preprocessor will put the leading underscore on where necessary.
The cglobal macro works similarly. You must use cglobal before defining
the symbol in question, but you would have had to do that anyway if you
used GLOBAL .
Included in the NASM archives, in the misc directory, is a file c32.mac
of macros. It defines three macros: proc , arg and endproc . These are
intended to be used for C-style procedure definitions, and they
automate a lot of the work involved in keeping track of the calling
convention.
An example of an assembly function using the macro set is given here:
proc _proc32
%$i arg
%$j arg
mov eax,[ebp + %$i]
mov ebx,[ebp + %$j]
add eax,[ebx]
endproc
This defines _proc32 to be a procedure taking two arguments, the first
(i ) an integer and the second (j ) a pointer to an integer. It returns
i + *j .
Note that the arg macro has an EQU as the first line of its expansion,
and since the label before the macro call gets prepended to the first
line of the expanded macro, the EQU works, defining %$i to be an offset
from BP . A context-local variable is used, local to the context pushed
by the proc macro and popped by the endproc macro, so that the same
argument name can be used in later procedures. Of course, you don't
have to do that.
arg can take an optional parameter, giving the size of the argument. If
no size is given, 4 is assumed, since it is likely that many function
parameters will be of type int or pointers.
Our first mixed kernel
Create a text file that contains the following code:
extern void sayhi(void);
extern void quit(void);
int main(void)
{
sayhi();
quit();
}
Save as "mix_c.c" .
Create another text file that contains the following code:
[BITS 32]
GLOBAL _sayhi
GLOBAL _quit
SECTION .text
_sayhi: mov byte [es:0xb8f9c],'H'
mov byte [es:0xb8f9e],'i'
ret
_quit: mov esp,ebp
pop ebp
retf
Save as "mix_asm.asm" .
From the same location as the files enter the commands
gcc -ffreestanding -c -o mix_c.o mix_c.c
nasm -f coff -o mix_asm.o mix_asm.asm
If you recieved no errors then enter the command
ld -Ttext 0x100000 --oformat binary -o kernel32.bin mix_c.o mix_asm.o
You should get a warning that says "warning: cannot find entry symbol
start; defaulting to 00100000" . Copy the new kernel32.bin over to the
same location as the old kernel32.bin . Try loading your new kernel
just like the old one. When you run the loader your system should
display "Hi" in the bottom right corner of your screen and you should
be returned to the prompt.
Additional Information
When linking your object files your code will appear inside of your
output file in the order of the input files. Also when using constants
in your C code such as myfunc("Hello"); gcc based compilers will put
your constants in the code segment before the beginning of the function
in which it's declared. When jumping or calling binary outputted C code
you have three options to avoid this problem. You can create a function
at the beginning your C code without constants thats calls or jumps to
the next function. You can link another file (assembly or C) before
your C code that is just there to call your C code. And your last
option is too use the gcc option -fwritable-strings to move your
constants into the data segment.
There is a problem with ld on Linux. The problem is that the ld that
comes with linux distros lists support for the coff object format , but
apparently you have to rebuilt binutils from http://www.gnu.org
to get it working. I found two possible solutions. Recompile ld, or
under edit your assembly files and remove all the leading underscores.
Then when you assemble with nasm use the-f aout option instead of coff.
I've tested the second method briefly and it works.
About The Loader
The loader in this lesson makes a small GDT with selectors for the
first 4 megabytes of memory and puts them in the segment registers
before calling the kernel. It also leaves all interrupts disabled while
the kernel runs. Don't try to enable int's in your kernel with this
loader because a protected mode IDT is never setup. Different lessons
will be using different loaders, so don't assume that you don't need to
download the loader for whatever lesson you're on. If your want to take
a look, the source for the loader is here.
|
|
|
|