Search     or:     and:
 LINUX 
 Language 
 Kernel 
 Package 
 Book 
 Test 
 OS 
 Forum 
iakovlev.org

Введение в реверс-инжениринг

Книга представляет из себя вводный курс по reverse engineering как под Linux , так и под Microsoft Windows. Мы рассмотрим основные подходы , которые позволяют программисту раскрывать содержимое черного ящика. Закрытые проприетарные системы - отличный разражитель для open-source программистов.

Эта книга активно разрабатывается и все еще ищет своего издателя. Данный текст неполон , в нем много пропусков. Она не столько о том , как из бинарника получить исходный код , сколько о научном дедуктивном методе анализа , выделения и модификации различной функциональности закрытых программ. В основном разговор идет на высокоуровневом языке программирования , но при необходимости речь переходит на ассемблер.

Книга написана с таким расчетом , чтобы каждый , кто хотя бы прослушал вводный курс по программированию или на худой конец прочитал Teach Yourself in 21 days, мог бы в состоянии понять материал.

Знание таких вещей ,как AVL trees, Hash Tables, Graphs, priority queues, (см. Design Patterns) также будет весьма полезным для понимания.

Что такое reverse engineering?

Reverse engineering - технология , позволяющая обходиться без исходников для того , чтобы модифицировать какой-то софт или понять , как он работает.

Вообще говоря, reverse engineering - это довольно тяжкий труд , и часто требуется команда инженеров , для того чтобы взломать какую-то систему. Но как бы там ни было , мы узнаем , что есть инструментальные средства , которые позволяют найти нужную информацию и хакнуть софт , к которому нет исходников

Зачем это нужно ?

Ответ : потому что это возможно.

Если у нас есть исходники , у нас есть ключ к пониманию идеологии системы и возможности управлять ее поведением . Но так бывает не всегда.

Закрытый софт всегда представляет особый интерес. Иногда нужно понять , как работает защитная функция, или взломать защиту .

Это делает вас лучше как программиста

В этой книге есть много чего о том , как работает компьютер на нижнем уровне , да и вообще как эффективно писать программы.

To Learn Assembly Language.

Если вы незнакомы с ассемблером , в конце книги есть ссылки . которые помогут вам . Вы узнаете , как Си генерирует машинный код .

How to use this book

Learn the General Approach

В книге сделан обзор Reverse Engineering как под UNIX (GNU/Linux) , так и под Microsoft Windows.

Не пытайтесь сфокусировать свое внимание на какой-то одной платформе. Пытайтесь понять сам процесс выделения информации , и как устроены самые различные средства в достижении этой цели.

Use the Scientific Method.

Изучая отдельную программу , вы изучаете систему. Научный подход можно разбить на несколько шагов :

  1. Выделение и описание какой-то функциональности в программе

  2. Формулировка , обоснование такого поведения

  3. На основе гипотез проникновение внутрь программы и замена кода

    Изменение функционала программы вслепую может привести к непредсказуемым результатам.

Большую роль здесь играет последовательность в отладке и тестировании.

Figure 1.1. Exploring a Hypothesis Space

Exploring a Hypothesis Space

Эта древовидная диаграмма показывает схему отладки под винду, в конкретных случаях последовательность графов в дереве может значительно отличаться от приведенного. Например , чтобы узнать , что программа читает из конфигов , можно использовать программу filemon. Чтобы узнать , как программа инициализирует полученные данные , можно использовать программу procexp. Чтобы узнать , какая информация берется из регистров , можно узнать с помощью regmon.

[Tip]NOTE

Если у вас нет стратегии , вы можете потратить большое количество времени на бесполезный обзор ассемблерного кода.

Start with an Aerial View, then Zoom In

Правильный подход заключается в правильной организации шагов . Но иногда приложение бывает настолько сложным, что непонятно, с чего начать. В этом случае вам могут помочь : Data Flow Diagrams и Activity Diagrams.

Они помогают представить систему в виде нескольких абстрактных уровней. И вы уже сами можете выбрать нужный уровень.

Data Flow Diagrams моделирует поток данных между исходником, процессом и хранилищем данных.

Chapter 2. The Compilation Process

Table of Contents

Revision History
Revision $Revision: 1.4 $$Date: 2004/02/12 21:43:34 $ 

Intro

Процесс компиляции можно разбить на 5 стадий : Preprocessing, Parsing, Translation, Assembling, Linking.

Figure 2.1. The compilation Process

The compilation Process

В данной схеме мы рассматриваем юниксовый компилятор - gcc (или g++). Последовательность такова : gcc -> gcc -E -> gcc -S -> as -> ld.

Под виндой процесс несколько более другой , но в рамках MSVC++ в принципе все то же самое. Под винду также есть версия гну-шного компилятора, смотрите MinGW project или Cygwin Project. Cygwin генерит POSIX-совместимый код . MinGW позволяет строить нативные виндовые приложения.

The Compiler

Генерация бинарного исполняемого файла выполняется компилятором. Это справедливо и для юникса , и для винды.

gcc

gcc - C-ный компилятор для UNIX. Для вывода протокола пошаговых действий , выполняемых gcc , нужно набрать опцию gcc -v

cl.exe

cl.exe выполняет аналогичную функцию в случае с MSVC++ под виндой. Попробуйте набрать cl -? .

Если вы запускаете cl.exe не из среды MSVC++ , нужно позаботиться о путях и системных переменных компиляции - в этом вам поможет системный батник vsvars32.bat , который обычно можно найти в подкаталоге CommonX/Tools.

The C Preprocessor

Препроцессор - это все , что стоит после # и управляет логикой компиляции на С .

gcc -E

gcc -E выполняет лишь одну стадию препроцесса. Будут подключены хидеры и макросы переведены в нормальный код. Добавление опции -o file укажет редирект на обьектный файл.

cl -E

- все то же самое.

Parsing And Translation Stages

Парсинг и трансляция - наиболее полезные для нас стадии компиляции. Юниксовый и виндовый синтаксисы ассемблеров , к сожалению , различаются. Юниксовый ассемблер в свою очередь , как известно , имеет интеловский диалект , а также AT-шный. Переключение между ними не должно составлять большого труда.

gcc -S

Опция gcc -S генерит из .c файлов ассемблерные файлы .s с синтаксисом AT&T. Если вы хотите получить интеловский синтаксис , добавьте опцию -masm=intel. Для генерации переменных стека используйте опцию -fverbose-asm.

gcc имеет опцию оптимизации. Можно задать опцию оптимизации -ON, где 0 <= N <= 6. 0 - нет оптимизации , 6 - максимальная оптимизация.

Есть также несколько опций , задаваемых с помощью флага -f. Некоторые из них: -funroll-loops, -finline-functions, -fomit-frame-pointer. Loop unrolling - создаются n копий для n итераций цикла и полностью отсутствуют директивы jmp. На новейших процах эта опция практически не дает эффекта. Инлайн-опция означает конвертацию функций в макросы. Omit-опция освобождает регистры для их эффективного использования, что дает выигрыш при большом количестве локальных переменных.

[Tip]NOTE

У всех этих опций есть соответствующие анти-опции : например опция -fno-inline-functions имеет обратное действие для опции -finline-functions.

С помощью опции -fverbose-asm опции компиляции будут распечатаны в заголовке ассемблерного файла. Недостатком gcc-3.x является то . что он все равно делает оптимизацию , даже если мы включили опцию уровня -O0. Отключение оптимизации делается с помощью опции -fno-.

cl -S

Аналогично , cl.exe имеет опцию -S для генерации ассемблера,а также опции оптимизации. К сожалению , cl не позволяет это делать в той же степени , что и gcc. Похожие опции для cl.exe :


-Ob<n> - inline functions (-finline-functions)
-Oy - enable frame pointer omission (-fomit-frame-pointer)

Assembly Stage

Это стадия перевода ассемблерного кода в машинные инструкции. Тут могут происходить небольшие сюрпризы , такие , как смена порядка команд.

GNU as

as - GNU-шный assembler. Он может компилировать как синтаксис AT&T , так и Intel syntax, генерируя при этом обьектные файлы .o

MASM

MASM - Microsoft assembler.

Linking Stage

Винда и юникс имеют схожие линковочные процедуры. Обе системы поддерживают 3 стиля линковки.

Static Linking

В этом случае код всех функций находится внутри самого исполняемого файла.

Dynamic Linking

При динамической линковке где-то отдельно лежит библиотека функций , и ее виртуальное пространство будет прикручено операционной системой к адресному пространству исполняемого файла. Вызовы таких библиотечных функций будут сгенерированы в момент компиляции программы и помещены в Procedure Linkage Table, или PLT.

Runtime Linking

Runtime-линковка происходит на лету , в момент выполнения уже откомпилированной программы. В юниксе для этого используется системный вызов dlopen(), под виндой это LoadLibrary().

ld/collect2

ld - GNU-шный линкер. Он непосредственно генерит исполняемый файл.

link.exe

Это MSVC++ linker. Для его вызова нужно задать опцию cl -link . Для генерации .dll вы должны иметь предварительно соответсвующие библиотеки .lib (или .def).

Java Compilation Process

Java , как известно , интерпертатор , и отличается от C/C++. Программы тут выполняются на т.н. Java Virtual Machine ( JVM), и реализовано это с помощью интерпретатора. Java-код не компилируется в исполняемый код , вместо этого происходит компиляция т.н. промежуточного байт-кода, который и выполняется на виртуальной маштне.

Figure 2.2. The Java Compile/Execute Path

The Java Compile/Execute Path

Java-класс размещается в отдельном исходном файле, имеющем имя самого класса и расширение .java. При компиляции каждый класс оказывается размещенным в собственном бинарном .class-файле. 2 уникальные фазы - class loader и bytecode verifier - отличаю Java от C/C++.

class loader загружает class' bytecode. Разработчик может написать свой собственный class loader.

При загрузке класса виртуальной машиной вызывается метод loadClass(String name, boolean resolve);. Далее класс передается в память через метод defineClass. Если класс не найден , может быть загружен родительский класс либо вызван метод findSystemClass.

Природа Lava-bytecode позволяет сравнительно легко декомпилировать байткод в исходник.

Chapter 3. Gathering Info

Table of Contents

Revision History
Revision $Revision: 1.4 $$Date: 2004/01/31 20:56:53 $ 

Первый шаг для понимания программы - получение информации о ней с помошью других программ.

System Wide Process Information

Как в винде , так и в линуксе есть приложения , которые дают информацию о выполняющихся процессах.

/proc

Файловая линуксовая система /proc вмещает в себя различную информацию , начиная от зависимых библиотек и кончая сокетами. /proc имеет каталог для каждого выполняемого процесса. Если у процесса pid=1337, то соответсвенно имеется каталог /proc/1337/. Вы можете наблюдать только те процессы , которые запустили именно вы.

Интересные параметры в этом каталоге : cmdline -- список командных параметров процесса cwd -- линк на рабочую директорию процесса environ -- список переменных окружения процесса exe -- линк на выполняемый файл fd -- список файловых дескрипторов , используемых процессом maps -- список memory locations процесса, которые можно напрямую посмотреть с помощью gdb/

Sysinternals Process Explorer

Sysinternals - эта известная марка предоставляет набор утилит под винду. Process Explorer показывает зависимые dll-ки, функции и соответсвующие им адреса , security attributes, открытые файлы, тип управляемых обьектов и т.д. Эта утилита позволяет модифицировать процесс. Можно управлять хэндлами , пермишинами , дебажить и т.д.

Figure 3.1. Process Explorer

Process Explorer

Obtaining Linking information

Для понимания работы программы нужно собрать информацию об используемых библиотеках.

ldd

Юниксовый ldd - это основная утилита для сбора информации о прилинкованных библиотеках. Она дает информацию о маппированных адресах внутри адресного пространства программы.

depends

depends - утилита , идущая в поставке Microsoft SDK, либо MS Visual Studio. Она показывает не только список dll-ек , но и функции внутри них, характер их импорта.

Figure 3.2. Depends

Depends

Когда вы кликаете на dll-ке , вы получаете список импортируемых функций (зеленая окраска). Вы также получаете список экспортируемых функций. Используемые покрашены в синий цвет , неиспользуемые - в серый. Иногда функция может быть "bound", в этом случае ее модификация затруднительна.

Obtaining Function Information

Следующий шаг - идентификация функциональных блоков программы.

nm

Юниксовый nm выводит список функций , глобальных переменных , и их адреса внутри бинарника.

dumpbin.exe

В винде есть аналог - dumpbin.exe,возможности которого поменьше. Он выводит список импортируемых/экспортируемых функций. К тому же эти функции должны иметь префикс __declspec( dllexport )

Viewing Filesystem Activity

lsof

Юниксовый lsof показывает список всех открытых файлов , используемых процессом. Это может быть обычный файл , каталог, block file, character special file, ссылка, библиотека, поток или сетевой файл(Internet socket, NFS file или UNIX domain socket). lsof как правило не инсталлируется по умолчанию на систему. В некоторых дистрибутивах он лежит в /usr/sbin. Пример вывода:

COMMAND     PID  USER   FD   TYPE     DEVICE     SIZE       NODE NAME
  bash        101 nasko  cwd    DIR        3,2     4096    1172699 /home/nasko
  bash        101 nasko  rtd    DIR        3,2     4096          2 /
  bash        101 nasko  txt    REG        3,2   518140    1204132 /bin/bash
  bash        101 nasko  mem    REG        3,2   432647     748736 /lib/ld-2.2.3.so
  bash        101 nasko  mem    REG        3,2    14831    1399832 /lib/libtermcap.so.2.0.8
  bash        101 nasko  mem    REG        3,2    72701     748743 /lib/libdl-2.2.3.so
  bash        101 nasko  mem    REG        3,2  4783716     748741 /lib/libc-2.2.3.so
  bash        101 nasko  mem    REG        3,2   249120     748742 /lib/libnss_compat-2.2.3.so
  bash        101 nasko  mem    REG        3,2   357644     748746 /lib/libnsl-2.2.3.so
  bash        101 nasko    0u   CHR        4,5              260596 /dev/tty5
  bash        101 nasko    1u   CHR        4,5              260596 /dev/tty5
  bash        101 nasko    2u   CHR        4,5              260596 /dev/tty5
  bash        101 nasko  255u   CHR        4,5              260596 /dev/tty5
  screen      379 nasko  cwd    DIR        3,2     4096    1172699 /home/nasko
  screen      379 nasko  rtd    DIR        3,2     4096          2 /
  screen      379 nasko  txt    REG        3,2   250336     358394 /usr/bin/screen-3.9.9
  screen      379 nasko  mem    REG        3,2   432647     748736 /lib/ld-2.2.3.so
  screen      379 nasko  mem    REG        3,2   357644     748746 /lib/libnsl-2.2.3.so
  screen      379 nasko    0r   CHR        1,3              260468 /dev/null
  screen      379 nasko    1w   CHR        1,3              260468 /dev/null
  screen      379 nasko    2w   CHR        1,3              260468 /dev/null
  screen      379 nasko    3r  FIFO        3,2             1334324 /home/nasko/.screen/379.pts-6.slack
  startx      729 nasko  cwd    DIR        3,2     4096    1172699 /home/nasko
  startx      729 nasko  rtd    DIR        3,2     4096          2 /
  startx      729 nasko  txt    REG        3,2   518140    1204132 /bin/bash
  ksmserver   794 nasko    3u  unix 0xc8d36580              346900 socket
  ksmserver   794 nasko    4r  FIFO        0,6              346902 pipe
  ksmserver   794 nasko    5w  FIFO        0,6              346902 pipe
  ksmserver   794 nasko    6u  unix 0xd4c83200              346903 socket
  ksmserver   794 nasko    7u  unix 0xd4c83540              346905 /tmp/.ICE-unix/794
  mozilla-b  5594 nasko  144u  sock        0,0              639105 can't identify protocol
  mozilla-b  5594 nasko  146u  unix 0xd18ec3e0              639134 socket
  mozilla-b  5594 nasko  147u  sock        0,0              639135 can't identify protocol
  mozilla-b  5594 nasko  150u  unix 0xd18ed420              639151 socket
         

Краткий комментарий вывода :

   cwd  current working directory
     mem  memory-mapped file
     pd   parent directory
     rtd  root directory
     txt  program text (code and data)
     CHR  for a character special file
     sock for a socket of unknown domain
     unix for a UNIX domain socket
     DIR  for a directory
     FIFO for a FIFO special file
      

[Tip]fuser

Схожий функционал имеет утилита fuser. У нее в качестве параметра можно задать имя файла или сокета и получить pid процесса , который имеет доступ к этому файлу.

Sysinternals Filemon

Аналогом lsof в винде является утилита Sysinternals Filemon. Она показывает открытые файлы , а также запросы на чтение/запись.

Sysinternals Regmon

Реестр в винде - ключ к понимаю системы , он хранит в себе массу секретов. Regmon мониторит все активно используемые разделы реестра.

Viewing Open Network Connections

В данном случае имя утилиты совпадает и для винды , и для юникса - речь идет о netstat.

netstat

Эта утилита есть чуть ли ни во всех операционных системах. ее можно использовать для мониторинга сетевых коннектов , роутинг-таблиц и т.д.

Если исследуемая программа использует сетевую коммуникацию, мы можем определить внешние хосты , куда идет коннект, причем речь идет не только о TCP. Пример вывода :

Figure 3.3. Netstat output

Active Internet connections (w/o servers)
  Proto Recv-Q Send-Q Local Address           Foreign Address         State
  tcp        0      0 slack.localnet:58705    egon:ssh                ESTABLISHED
  tcp        0      0 slack.localnet:51766    gw.localnet:ssh         ESTABLISHED
  tcp        0      0 slack.localnet:51765    gw.localnet:ssh         ESTABLISHED
  tcp        0      0 slack.localnet:38980    clortho:ssh             ESTABLISHED
  tcp        0      0 slack.localnet:58510    students:ssh            ESTABLISHED
  Active UNIX domain sockets (w/o servers)
  Proto RefCnt Flags       Type       State         I-Node Path
  unix  5      [ ]         DGRAM                    68     /dev/log
  unix  3      [ ]         STREAM     CONNECTED     572608 /tmp/.ICE-unix/794
  unix  3      [ ]         STREAM     CONNECTED     572607
  unix  3      [ ]         STREAM     CONNECTED     572604 /tmp/.X11-unix/X0
  unix  3      [ ]         STREAM     CONNECTED     572603
  unix  2      [ ]         STREAM                   572488
         

[Tip]NOTE

Показан юниксовый вывод . Под винду почти все то же самое.

В данном случае вывод разделен на 2 части - Internet connections и UNIX domain sockets. Первый столбец - имя протокола. Следующие два - получаемые и отсылаемые пакеты. Затем - source host/port, destination host/port. Последний столбец - состояние коннекта. В TCP-коннекте присутствует несколько стадий - ESTABLISHED . SYN_SENT, TIME_WAIT, LISTEN.

В зависимости от опций netstat на экран можно выводить еще больше информации. Опция -a показывает все коннекты . Опция -n показывает IP-адреса в числовом виде.

        
 netstat -p as normal user
 (Not all processes could be identified, non-owned process info
  will not be shown, you would have to be root to see it all.)
 Active Internet connections (w/o servers)
 Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
 tcp        0      0 slack.localnet:58705    egon:ssh                ESTABLISHED -
 tcp        0      0 slack.localnet:58766    winston:www             ESTABLISHED 5587/mozilla-bin
        

        
 netstat -npa as root user
 Active Internet connections (servers and established)
 Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
 tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN      390/smbd
 tcp        0      0 0.0.0.0:6000            0.0.0.0:*               LISTEN      737/X
 tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      78/sshd
 tcp        0      0 10.0.0.3:58705          128.174.252.100:22      ESTABLISHED 13761/ssh
 tcp        0      0 10.0.0.3:51766          10.0.0.1:22             ESTABLISHED 897/ssh
 tcp        0      0 10.0.0.3:51765          10.0.0.1:22             ESTABLISHED 896/ssh
 tcp        0      0 10.0.0.3:38980          128.174.252.105:22      ESTABLISHED 8272/ssh
 tcp        0      0 10.0.0.3:58510          128.174.5.39:22         ESTABLISHED 13716/ssh
        

Данный вывод показывает , что мозилла устанавливает коннект с winston ( port - www(80)). SMB daemon, X server, и ssh daemon стоят на прослушивании.

Gathering Network Data

Сбор сетевой информации обычно выполняется с помощью сниферов. ethernet card устанавливается в режим promiscuous mode и анализируется весь траффик , проходящий через нее. Что такое promiscuous mode? Ethernet - это broadcast media, или устройство , которой является промежуточным звеном в передаче сетевого трафика. Каждый сетевой интерфейс (network interface card -NIC) имеет свой уникальный идентификатор - MAC (Media Access Control). Все пакеты , идущие по локальной сети , в режиме promiscuous mode могут читаться сетевым интерфейсом , независимо от того , какой у этих пакетов пункт назначения.

Есть несколько наиболее популярных снифферов , с различным интерфейсом и возможностями. Вот они :

  • ethereal - один из лучших снифферов. Имеет графический GTK-интерфейс. Кроме того что он сниффер , он еще и протокольный анализатор. Он анализирует в том смысле , что например для TCP-протокола он может показать флаги SYN , ACK, или kerberos или NTLM -хидеры. Есть версия как под винду , так и под линукс. Для него требуется библиотека pcap. Доступен www.ethereal.com , также вам понадобится libpcap для Linux или WinPcap для Microsoft Windows.

  • tcpdump - один из первых снифферов. Это консольное приложение. Он встроен в большинство линуксовых дистрибутивов. Версия для Microsoft Windows - WinDump.

  • ettercap - еще один консольный сниффер. Использует ncurses-библиотеку для консольного GUI. Он построен на основе ARP.

Рассмотрим задачу : как происходит аутентификация mail-клиента и как он забирает сообщения с сервера. Протокол - POP3, и мы говорим снифферу ethereal забирать трафик с порта 110. Для входящего и исходящего трафика установим опцию tcp.port == 110. Пример сеанса работы ethereal:

Figure 3.4. Ethereal capture

Ethereal capture

Ethereal разбивает пакеты, показывая версию Internet Protocol 4.

Chapter 4. Determining Program Behavior

Table of Contents

Revision History
Revision $Revision: 1.3 $$Date: 2004/01/31 20:56:53 $ 

There are a couple of tools that allow us to look into program behavior at a more closer level. Lets look at some of these:

Tracing System Calls

This section is really only relevant for to our efforts under UNIX, as Microsoft Windows system calls change regularly from version to version, and have unpredictable entry points.

strace/truss(Solaris)

These programs trace system calls a program makes as it makes them.

Useful options:

  1. -f (follow fork)

  2. -ffo filename (output trace to filename.pid for forking)

  3. -i (Print instruction pointer for each system call)

Tracing Library Calls

Now we're starting to get to the more interesting stuff. Tracing library calls is a very powerful method of system analysis. It can give us a *lot* of information about our target.

ltrace

This utility is extremely useful. It traces ALL library calls made by a program.

Useful options:

  1. -S (display syscalls too)

  2. -f (follow fork)

  3. -o filename (output trace to filename)

  4. -C (demangle C++ function call names)

  5. -n 2 (indent each nested call 2 spaces)

  6. -i (prints instruction pointer of caller)

  7. -p pid (attaches to specified pid)

API Monitor

API Monitor is incredible. It will let you watch .dll calls in real time, filter on type of dll call, view

Chapter 5. Determining Interesting Functions

Table of Contents

Revision History
Revision $Revision: 1.2 $$Date: 2004/01/25 16:57:43 $ 

Clearly without source code, we can't possibly hope to understand all of sections of an entire program. So we have to use various methods and guess work to narrow down our search to a couple of key functions.

Reconstructing function & control information

The problem is that first, we must determine what portions of the code are actually functions. This can be difficult without debugging symbols. Fortunately, there are a couple of utilities that make our lives easier.

objdump

Objdump's most useful purpose is to disassemble a program with the -d switch. Lacking symbols, this output is a bit more cryptic. The -j option is used to specify a segment to disassemble. Most likely we will want .text, which is where all the program code lies.

Note that the leftmost column of objdump contains a hex number. This is in fact the actual address in memory where that instruction is located. Its binary value is given in the next column, followed by its mnemonic.

objdump -T will give us a listing of all library functions this program calls.

disasm.pl

Steve Barker wrote a neat little perl script that makes objdump much more legible in the event that symbols are not included. The script has since been extended and improved by myself and Nasko Oskov. It now makes 3 passes through the output. The first pass builds a symbol table of called and jumped-to locations. The second pass finds areas between two rets, and inserts them into the symbol table as "unused" functions. The third pass prints out the nicely labeled output, and prints out a function call tree. Usage:

./disasm /path/to/binary > binary.asminfo

There are/will be few command line options to the utility. Now --graph is supported. It will generate a file called call_graph that contains definition that can be used with a program called dot to generate visual representation of the call graph.

Note: Unused functions just mean that that function wasn't called DIRECTLY. It is still possible that a function was called through a function pointer (ie, main is called this way)

Consider the objective

Ok, so now we're getting ready to get really down and dirty. The first step to finding what you are looking for is to know what you are looking for. Which functions are 'interesting' is entirely dependent on your point of view. Are you looking for copy protection? How do you suspect it is done. When in the program execution does it show up? Are you looking to do a security audit of the program? Is there any sloppy string usage? Which functions use strcmp, sprintf, etc? Which use malloc? Is there a possibility of improper memory allocation?

Finding key functions

If we can narrow down our search to just a few functions that are relevant to our objective, our lives should be much easier.

Finding main()

Regardless of our objective, it is almost always helpful to know where main() lies. Unfortunately, when debugging symbols are removed, this is not always easy.

In Linux, program execution actually begins at the location defined by the _start symbol, which is provided by gcc in the crt0 libraries (check gcc -v for location). Execution then continues to __libc_start_main(), which calls _init() for each library in the program space. Each _init() then calls any global constructors you may have in that particular library. Global constructors can be created by making global instances of C++ classes with a constructor, or by specifying __attribute__((constructor)) after a function prototype. After this, execution is finally transferred to main through an indirect call off of the base register ebp.

The easiest technique is to try to use our friends ltrace and gdb (FIXME: the debugging chapter has been moved to after this one..) together with our disassembled output. Checking the return address of the first few functions of ltrace -i, and cross referencing that to our assembly output and function call tree should give us a pretty good idea where main is. We may have to try to trick the program into exiting early, or printout out an error message before it gets too deep into its call stack.

Other techniques exist. For example, we can LD_PRELOAD a .c file with a constructor function in it. We can then set a breakpoint to a libc function that it calls that is also in the main executable, and finish and stepi until we are satisfied that we have found main.

Even better, we could just set a breakpoint in the function __libc_start_main (which is a libc function, and thus we will always have a symbol for it), and do the same technique of finishing and stepping until we reach what looks like main to us.

At worst, even without a frame pointer, we should be able to get the address of a function early enough in the execution chain for us to consider it to be main.

Finding other interesting functions

Its probably a good idea to make a list of all functions that call exit. These may be of use to us. Other techniques for tracking down interesting functions include:

  1. Checking for which functions call obscure gui construction widgets used in a dialog box asking for a product serial number

  2. Checking the string references to find out which functions reference strings that we are interested in. For example, if a program outputs the text "Already registered." knowing what function outputs this string is helpful in figuring out the protection this particular program uses.

  3. Running a program in gdb, then hitting control C when it begins to perform some interesting operation. using stepi N should slow things down and allow you to be more accurate. Sometimes this is too slow however. Find a commonly called function, set a breakpoint, and try doing cont N.

  4. Checking which functions call functions in the BSD socket layer

Plotting out program flow

Plot out execution paths into a tree from main, especially to your function(s) of interest. You can use disasm.pl to generate call graphs with the --graph option. Using it enables the script to generate file called call_graph. It contains definition of the call graph in a format used by a popular graphing tool called dot. Feeding this definition file in dot will give you a nice (probably pretty huge) graphics file with visual representation of the call graph. It is pretty amazing. Definitely try it with some small program.

Further analysis will have to hold off until we understand some assembly.

Chapter 6. Understanding Assembly

Table of Contents

Revision History
Revision $Revision: 1.3 $$Date: 2004/02/25 21:04:10 $ 

Since the output of all of these tools is in AT&T syntax, those of you who know Intel/MASM syntax have a bit of re-learning to do.

Assembly language is one step closer to the hardware than high level languages like C and C++. So to understand assembly, you have to understand how the hardware works. Lets start with a set of memory locations known as the CPU registers.

Registers

Registers are like the local variables of the CPU, except there are a fixed number of them. For the ix86 CPU, there are only 4 main registers for doing integer calculations: A, B, C, and D. Each of these 4 registers can be accessed 4 different ways: as a 32 bit value (%eax), as a 16 bit value (%ax), and as a low and a high 8 bit value (%al and %ah). There are five more registers that you will see used occasionally - namely SI, DI, SP and BP. SI and DI are around from the DOS days when people used 64k segmented addressing, and as it turns out, may be used as integer like normal registers now. SP and BP are two special registers used to handle an area of memory called the stack. There is one last register, the instruction pointer IP that you may not modify directly, but is changed through jmps and calls. Its value is the address of the next instruction to execute. (FIXME: Check this)

Note: If gcc was called with the -fomit-frame-pointer, the BP register is freed up to be used as an extra integer register.

The stack

What is A stack?

A stack is what is called a Last In, First Out data structure or LIFO. Think of it as a stack of plates. The most recent (last) plate pushed on top of the stack is the first one to be removed. This allows us to manage the stack with only one register if need be, namely the stack pointer or SP register.

What is THE stack?

The stack is a region of memory that is present throughout the entire lifetime of a program. It is where local variables are stored, and it is also how function call arguments are passed.

On almost all modern computers, the stack is said to grow down, that is, as elements are pushed on to it, the SP register is decremented by the size of the element pushed. From our earlier analogy, its as if the stack of plates where hung from the ceiling, new plates were inserted at the bottom, and the whole stack some sort of catch to stop them all from dumping out. That catch would be the SP register.

So the stack starts from a high memory address, and works down to a lower address. This is because another section of memory called the heap grows up, and its handy to have the two of them grow towards eachother to fill in a single empty hole in the program address space.

Note: It is easy to become confused when dealing with the stack. Remember that while it may grow down, variables are still addressed sequentially upwards. So an array of char b[4] at esp of 80 will have b[0] at 80 (right at the stack pointer), b[1] above that at 81, b[2] at 82, and b[3] at 83, which is where the stack pointer was before the push. The next push will then place the stack pointer at 76.

Working with the stack

There are two instructions that deal with the stack directly: push and pop. Each take a register or value as an argument. Push will place its argument onto the stack, and then decrement the SP by the size of its argument (4 for pushl, 2 for pushw, 1 for pushb). //FIXME (What is pushl and push b) Pop copies the value on the top of the stack into its argument, then increments SP. Pusha and popa push and pop all the registers with one instruction. Because of speed considerations, the value is not touched, just the SP register is changed to point to the next location ot the stack. So SP is always pointing to the top value of the stack and not at invalid memory.

Normal arithmetic expressions can also be used to modify SP to make space for working directly with stack memory with other instructions.

How gcc works with the stack

Right before a function is called, its arguments are pushed onto the stack in reverse order. Then the call instruction pushes the address of the next instruction (ie the value of IP after call) onto the stack, and then the CPU begins executing the address of the call by copying that value into the invisible instruction pointer (IP) register.

The called function then starts with what is known as the function prolog, which pushes the current base pointer onto the stack, and then copies the current stack pointer to the base pointer, and then subtracts from SP enough space to hold all local variables (and then some!). The base pointer is then used to reference variables and parameters during function execution, since its value is not affected by pushes and pops. Thus, parameters all have fixed positive offsets from the BP, where as local variables all have fixed negative offsets from the BP.

At the end of function execution, the base pointer is copied to the stack pointer during ret, and the return address is popped off the stack and placed into the invisible IP register to return to the caller function.

Note: Unless -fomit-frame-pointer is specified, gcc always generates code that references local variables by negative offsets from the BP instead of positive offsets from the SP.

Two's complement

What is it?

Two's complement is specific way signed integers are represented in pretty much all modern computers. This is due to the fact that two's complement form has several advantages:

  1. The same rules for addition apply, no extra work is required to compute the sum of negative integers.

  2. Easy to negate a number.

  3. The most significant bit tells you the sign: 0 is positive, 1 is negative.

It should be noted that when using signed values the ranges of number that can be represented by a specific number of bits is less than the usual. The range is -(2n-1) to +(2n-1)-1

Conversion

There are several ways to convert any unsigned binary number into signed two's complement form. The most intuitive and easy to remember is the following Complement each bit of the number and add one. Let's find how -13 is represented, so we convert it into its binary form:

0000 1101
 
 Then invert all the bits.
 1111 0010
 
 Now add one to it.
 1111 0011
 
 So 1111 0011 is -13 in two's complement.
         

Second method is to complement all the bits to the left of the rightmost 1 bit, but not including it (but not the rightmost bit, for example 0001 0100). It sounds a bit complicated, but is easier once you figure out how it is done. Let's get back to the example of -13.

0000 1101
         ^
 Invert the bits to the left of the rightmost one.
 1111 0011
         

There you go. We get the number without second step of adding one. It can be proven why this method works, but we are not in class. Yet a third method is to subtract the number from 2n. Here is how it works.

 1000 0000
 -
  0000 1101
  ---------
  1111 0011
         

There may be other ways of doing it, but if you master those, you will not need to remember any more. To convert a negative number in two's complement, you apply the exact same procedure as described and you get back the positive value for the number.

From reverse engineering angle

Now that we know what two's complement is let's look at some examples of this type of representation in reverse engineering process. Using one of the tools discussed earlier, objdump and the wrapper disasm.pl, let's look at the ls command binary. If you look at function7 (which starts at address 80495a8), lines like the following appear frequently:

 80495be:       83 c4 f8                add    $0xfffffff8,%esp
         

What does this instruction do? It just adds some constant to the stack pointer register (%esp). There are two ways you can look at this constant. It is either a huge unsigned number or two's complement negative number. Since we just add to the stack pointer, it does not make sense to be big number, so let's find what is the value of this number.

  f    f    f    f    f    f    f    8
 1111 1111 1111 1111 1111 1111 1111 1000
 
 0000 0000 0000 0000 0000 0000 0000 1000
   0    0    0    0    0    0    0    8
         

Now we can see that this is just the negative of 0x00000008 or just plain -8 in decimal. If you think about this, what this line does is decrement the stack pointer by 8 bytes (allocate more space).

Byte Ordering

One common difficulty in working on multiple platforms is that different platforms use different byte orders. Byte ordering refers to the physical layout of integer data in memory. There are two different orderings - little endian and big endian. When a data structure or data type is represented by more than one byte, the ordering of bytes matters. For example if we consider a long (4 bytes) let's label the least significant byte 0 and the most significant one 3. If we are on little endian machine the long will be represented in memory like this (yeah, some machines do not allow addressable bytes, but let's forget about this):

     0x040   0
      0x041   1
      0x042   2
      0x043   3
  

On a big endian machine on the other hand, the long will be layed out like that:

     0x040   3
      0x041   2
      0x042   1
      0x043   0
  

Now let's look at an example. The easiest way to see the difference in byte ordering is to look at how a long is stored in memory on different architectures. Here is an example program that will demonstrate it.

#include <stdio.h>
 
 int main() {
 
     long longval = 123456789;
 
     printf("%s\n", test);
 
 }
 

After compiling it with debugging info, let's run it and see what will be the result. The first run is on Intel-based machine.

bash$ uname -a 
 Linux slack 2.4.20 #5 Tue Dec 31 00:01:00 CST 2002 i686 unknown
 

bash$ gdb ./a.out 
 GNU gdb 5.2.1
 This GDB was configured as "i386-slackware-linux"...
 (gdb) break main
 Breakpoint 1 at 0x8048338: file test.c, line 5.
 (gdb) run
 Breakpoint 1, main () at test.c:5
 5               long longval = 123456789;
 (gdb) stepi
 8               printf("value is %d\n", longval);
 
 Let's get the address of longval in memory
 (gdb) print &longval
 $2 = (long int *) 0xbffff234
 
 Let's print the contents of longval as a word
 (gdb) x/w 0xbffff234
 0xbffff234:     0x075bcd15
 
 Let's print the contents of longval as 4 consecutive bytes
 (gdb) x/4b 0xbffff234
 0xbffff234:     0x15    0xcd    0x5b    0x07
 (gdb) quit
 
 

The second run was on Sparc machine running Solaris.

remsun1> uname -a 
 SunOS remsun1 5.8 Generic_108528-16 sun4u sparc SUNW,Sun-Fire-280R
 

remsun1> gdb ./a.out 
 GNU gdb 4.18
 This GDB was configured as "sparc-sun-solaris2.7"...
 (gdb) break main
 Breakpoint 1 at 0x10564: file test.c, line 5.
 (gdb) run
 Breakpoint 1, main () at test.c:5
 5               long longval = 123456789;
 (gdb) stepi
 0x10568 5               long longval = 123456789;
 
 Let's get the address of longval in memory
 (gdb) print &longval
 $1 = (long int *) 0xffbefaec
 
 Let's print the contents of longval as a word
 (gdb) x/1w 0xffbefaec
 0xffbefaec:     0x075bcd15
 
 Let's print the contents of longval as 4 consecutive bytes
 (gdb) x/4b 0xffbefaec
 0xffbefaec:     0x07    0x5b    0xcd    0x15
 (gdb)
 
 

One can clearly see how on the Sparc machine the individual bytes are in the same order as in the printed word, whereas the Intel machine has it reverse.

This is the difference in byte ordering. In order for different hosts on the same network to be able to communicate and the exchanged data to make sense, they agree on common byte ordering. In modern networking the data is transmitted in big endian byte ordering i.e. most significant byte comes first. On the i80x86 the host byte order is Least Significant Byte first, whereas the network byte order, as used on the Internet, is Most Significant Byte first.

Reading Assembly

Keep track of the stack and registers

The secret to understanding assembly code is to always work with a sheet of paper and a pencil. When you first sit down, draw out a table for all 6 registers A, B, C, D, SI, and DI. Keep track of the high and low portions as well. Each new line of this table should represent a modification of a register, so the last value in each register column is the current value of that register.

Next, draw out a long column for the stack, and leave space on the sides to place the BP and SP registers as they move down. Be sure to write all values into the stack as they are placed there, including ret and the stored BP.

AT&T syntax

In AT&T syntax, all instructions are of the form:

mnemonic src, dest

Standalone numerical constants are prepended with a $. Hexadecimal numbers always start with 0x (as opposed to ending in h). Registers are specified with a % sign, ie %eax.

Dereferencing or pointer representation is of the form disp(%base, %index, scale), where the resulting address is disp + %base + %index*scale. disp and scale are constants (no $), and %base and %index are registers. Any of these 4 may be omitted, leaving either blank space and then a comma, or simply leaving off the argument, and all remaining arguments. For example, 4(%eax) means memory address 4+%eax, where as (,%eax,4) means %eax*4. This compact notation makes array indexing easy.

Intel Instruction Set

From here, it is simply a matter of understanding what each assembly mnemonic does. Most common mnemonics are obvious, but you can find a complete description of all the Intel instructions (in agonizing detail) at Intel's Developer Site. Volume 2 contains the instruction list. Keep in mind that in Intel syntax, operands are in the reverse order of AT&T syntax (ie, mnemonic dest,src).

Know Your Compiler

In order to learn to read assembly effectively, you really have to know what type of code your compiler likes to generate in certain situations. If you learn to recognize what a while loop, a for loop, an if-else statement all look like in assembly, you can learn to get a general feel for code more quickly. There are also a few tricks that GCC performs that may seem unintuitive at first to the neophyte reverse engineer, even if they already know how to forward-engineer in assembly.

Basic Control Structures

In assembly, the only flow control mechanisms are branching and calling. So every control structure is built up from a combination of goto's and conditional branches. Lets look at some specific examples.

Function Calls

So we've mentioned that function calls use the stack to pass arguments. But where does that leave return values? And what about local variables?

Local variables are also on the stack, just below the base pointer instead of above. But if you thought that a return value was a pop off of the stack, you were wrong! GCC places the return value of a particular function into the eax register at the end of that function. Upon calling a function with a return value, it knows to copy the eax register into whatever variable will store that return value.

So lets see some gcc output for function calls. Get your paper ready, we're going to need to draw our stack and register table to follow these. Yeah yeah, it seems like a hassle, and you're sure you can do without it. We know, we know. But humor us. If you at least practice the methodical way a few times, doing things in your head will become easier later.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer To get the most out of these examples, start at main, and trace execution throughout the executable. Do the low optimization first, and then move up to higher levels. The comments assume you are progressing in that order. FIXME: We may want to split these out into several simpler example files, to avoid overwhelming people all at once.

gcc 3.3.2 The same files are also compiled with gcc version 3.3.2 and the corresponding files are functions.c, no optimization, -O3 -fomit-frame-pointer.

The if statement

The if statement is represented in assembly as a test followed by a jump. The thing to notice is that sometimes the body of the if statement is what is jumped to, as opposed to being jumped over as your C code may specify. This means that the condition for the jump will often be the negation of the condition for your if statement.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 The same files are also compiled with gcc version 3.3.2 and the corresponding files are if.c, no optimization, -O3 -fomit-frame-pointer.

The if..else statement

So we've seen that if statements are usually done by doing a single jump over the statement body. If..else statements operate the same way, except with an unconditional jump at the end of the if statement body that diverts execution flow to the end of the else body.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 ifelse.c, no optimization, -O3 -fomit-frame-pointer

If..else..if statements

Adding another if in an else clause works the same way as having an if statement inside an else clause. We just simply jump to another label if it evaluates to false, and if the first if statement evaluates as true, at the bottom of it we simply jump past both the else if and any remaining else clauses.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 ifelseif.c, no optimization, -O3 -fomit-frame-pointer

Complicated if statements

Of course, if statements can get much more complicated than the above examples. They can contain boolean short-circuits, function calls, nested-ifs, etc.

The while loop

Think about the while loop for a second. Think about how it operates. Basically, you could write a while loop with an if and a goto statement inside the if body to the top of the loop. So, since the only branching mechanisms we have in assembly are jumps and calls, while loops are just if statements with a jmp back to the top at the bottom.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 while.c, no optimization, -O3 -fomit-frame-pointer

The for loop

So lets rewrite the above loop as a for loop, to see if our professors were lying to us when they said these loops were equivalent.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 for.c, no optimization, -O3 -fomit-frame-pointer

The do...while loop

Do while loops are a bit different than for and while loops in that they allow execution of the loop body to occur at least once. As such, their comparison instructions take place at the bottom of the loop as opposed to the top. Observe:

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 dowhile.c, no optimization, -O3 -fomit-frame-pointer

Arrays

Arrays on the stack

Arrays on the stack are just memory regions that we access with variations on the disp(%base, %index, scale) idea presented earlier. So lets start with a warm-up consisting of a simple char array where we let libc do all the work.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 array-stack-char.c, no optimization, -O3 -fomit-frame-pointer

So lets do another example where we do all the work. One dimensional arrays are the easiest, as they are simply a chunk of memory that is the number of elements times the size of each element.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 array-stack-int1D.c, no optimization, -O3 -fomit-frame-pointer

Two dimensional arrays are actually just an abstraction that makes working with memory easier in C. A 2D array on the stack is just one long 1D array that the C compiler divides for us to make it manageable. To parameterize things, an array declared as: type array[dim2][dim1]; is really a 1D array of length dim2*dim1*type. The C compiler handles array indexing as follows: array[i][j] is the memory location array + i*dim1*type + j*type. So it divides our 1D array into dim2 sections, each dim1*type long.

FIXME: Graphics to illustrate this.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 array-stack-int2D.c, no optimization, -O3 -fomit-frame-pointer

As I tell my introductory computer science students, the best way to think of higher dimensional arrays is to think of a set of arrays of the next lower dimension. So the best way to think about how a 3D array can be jammed into a 1D array is to think about how a set of 2D arrays would be jammed into a 1D array: one right after another. So for array declared as type array[dim3][dim2][dim1];, array[i][j][k] means array + i*dim2*dim1*type + j*dim1*type + k*type. So this means just by looking at the assembly multiplications of the indexing variables, we should be able to determine n-1 dimensions of any n dimensional array. The remaining dimension can be determined from the total size, or the bounds of some initialization loop.

FIXME: Diagram/graphics to show this

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 array-stack-int3D.c, no optimization, -O3 -fomit-frame-pointer

Arrays through malloc

Structs

Using structs

Structures (structs) are a convenient way of managing related variables without having to write a class to encapsulate all of them. A structure is essentially a class without any member functions. Structures are used VERY often in C in order to avoid passing several variables back and forth between functions. Instead of passing all the variables, a common practice is to encapsulate all of them in a struct and just pass the location of the struct in memory to the function that needs access to those variables. Structures in C++ are declared like this:

      struct a
       {
          int first;
          float second;
          char *third;
       };
      

Don't forget that ; after the last brace. Structs can store any type of variable that you would normally be able to declare anywhere in your program. To access a variable in a struct you use the dot (.) operator. For example, to assign 5 to the variable first in the struct a, do

     a.first = 5;
      

Arrays of structs

Arrays of structs are created just as you would create an array of any other variable. Using the declaration of a above, an array of a structs of size 10 would be declared like this:

       struct a stuctarray[10];
        

Note the use of the struct keyword, followed by the name of the struct declared, followed by the name of the array.

The code above declares a static array of structs. This means that space will be allocated for this array during load time (FIXME: Check this). Struct arrays can also be declared as pointers so that space for individual elements can be allocated at run time as it is needed. (FIXME: Um...how is this done?...time to brush up on C).

Passing structs

Returning structs

GCC handles structs a bit oddly. When you have a function that returns a struct, what gcc does is actually push the address of the struct onto the stack just before calling the function (as if the first argument to the function was a pointer to the struct that will contain the return i value). Then, inside the function, code is generated to modify the struct through this address. At the end of the function, the value of %eax contains a pointer to the struct that was passed on to the stack. So instead of the normal convention of having %eax store the return value, %eax stores a pointer to the return value, and the return value is modified directly inside of the function.

gcc 2.95 Example .c file and gcc output with no optimization, with -O2, and with -O3 -fomit-frame-pointer

gcc 3.3.2 struct.c, no optimization, -O3 -fomit-frame-pointer

Classes (ie C++ code)

C with Classes

Inheritance

Virtual functions

Operator Overloading

Templates

Global variables

Exercises

These examples were all compiled using GCC 2.95.4 under Debian 3.0/Testing. A good exercise would be to go compile some of these examples with GCC 3.0 under high optimizations, changing some things around and viewing the resulting asm to get a feel for that new compiler and how it does things, as code it generates will begin to become more ubiquitous as time goes on. It was still considered rather unstable as of this writing, so we opted for the older GCC for all these examples for that reason.

Chapter 7. Debugging

Table of Contents

Revision History
Revision $Revision: 1.6 $$Date: 2004/01/31 20:56:53 $ 

User-level Debugging

DDD

DDD is the Data Display Debugger, and is a nice GUI front-end to gdb, the GNU debugger. For a long time, the authors believed that the only thing you really needed to debug was gdb at the command line. However, when reverse engineering, the ability to keep multiple windows open with stack contents, register values, and disassembly all on the same workspace is just too valuable to pass up.

Also, DDD provides you with a gdb command line window, and so you really aren't missing anything by using it. Knowing gdb commands is useful for doing things that the UI is too clumsy to do quickly. gdb has a nice built-in help system organized by topic. Typing help will show you the categories. Also, DDD will update the gdb window with commands that you select from the GUI, enabling you to use the GUI to help you learn the gdb command line. The main commands we will be interested in are run, break, cont, stepi, nexti, finish, disassemble, bt, info [registers/frame], and x. Every command in gdb can be followed by a number N, which means repeat N times. For example, stepi 1000 will step over 1000 assembly instructions.

Setting Breakpoints

A breakpoint stops execution at a particular location. Breakpoints are set with the break command, which can take a function name, a filename:line_number, or *0xaddress. For example, to set a breakpoint at the aforementioned __libc_start_main(), simply specify break __libc_start_main. In fact, gdb even has tab completion, which will allow you to tab through all the symbols that start with a particular string (which, if you are dealing with a production binary, sadly won't be many).

Viewing Assembly

Ok, so now that we've got a breakpoint set somewhere, (let's say __libc_start_main). To view the assembly in DDD, go to the View menu and select source window. As soon as we enter a function, the disassembly will be shown in the bottom half of the source window. To change the syntax to the more familar Intel variety, go to Edit->Gdb Settings... under Disassembly flavor. This can also be accomplished with set disassembly-flavor intel from the gdb prompt. But using the DDD menus will save your settings for future sessions.

Figure 7.1. ASM in DDD

ASM in DDD

Viewing Memory and the Stack

In gdb, we can easily view the stack by using the x command. x stands for Examine Memory, and takes the syntax x /<Number><format letter><size letter> <ADDRESS>. Format letters are (octal), x(hex), d(decimal), u(unsigned decimal), t(binary), f(float), a(address), i(instruction), c(char) and s(string). Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes). For example, x /32xw 0x400000 will dump 32 words (32 bit integers) starting at 0x400000. Note that you can also use registers in place of the address, if you prefix them with a $. For example, x /32xw $esp will view the top 32 words on the stack.

DDD has some nice capabilities for viewing arbitrary dumps of memory relating to the registers. Go to View->Data Window... Once the Data Window is open, go to Display (hold down the mouse button as you click), and go to Other.. You can type in any symbol, variable, expression, or gdb command (in backticks) in this window, and it will be updated every time you issue a command to the debugger. A couple good ones to do would be `x /32xw $esp` and `x/16sb $esp. Click the little radio button to add these to the menu, and you can then open the stack from this display and it will be updated in real time as you step through your program.

Figure 7.2. Stack Displays with New Display Window

Stack Displays with New Display Window

Viewing Memory as Specific Data Structures

So DDD has fantastic ability to lay out data structures graphically, also trough the Display window mentioned above. Simply cast a memory address to a pointer of a particular type, and DDD will plot the structure in its graph window. If the data structure contains any pointers, you can click on these and DDD will open up a display for that structure as well.

Oftentimes, programs we're interested in won't have any debugging symbols, and as such, we won't be able to view any structures in an easy to understand form. For seldom used structures, this isn't that big of a deal, as you can just take them apart using the x command. However, if you are dealing with more complicated data structures, you may want to have a set of types available to use again and again. Luckily, through the magic of the ELF format, this is relatively easy to achieve. Simply define whatever structures or classes you suspect are used and include whatever headers you require in a .c file, and then compile it with gcc -shared. This will produce a .so file. Then, from within gdb but before you begin debugging, run the command set env LD_PRELOAD=file.so. From then on, you will be able to use these types in that gdb/DDD session as if they were compiled in to the program itself. (FIXME: Come up with a good example for this).

Using watchpoints

-> Example using gdb to set breakpoints in functions with and without debugging symbols.

-> FIXME: Test watchpoints

WinDbg

WinDbg is part of the standart Debugging Tools for Microsoft Windows that everyone can download for free from. Microsoft offers few different debuggers, which use common commands for most operations and ofcourse there are cases where they differ. Since WinDbg is a GUI program, all operations are supposed to be done using the provided visual components. There is also a command line embeded in the debugger, which lets you type commands just like if you were to use a console debugger like ntsd. The following section briefly mentions what commands are used to do common everyday tasks. For more complete documentation check the Help file that comes with WinDbg. An example debugging session is presented to help clarify the usage of the most common commands.

Breakpoints

Breakpoints can be set, unset, or listed with the GUI by using Edit->Breakpoints or the shortcut keys Alt+F9. From the command line one can set breakpoints using the bp command, list them using bl command, and delete them using bc command. One can set breakpoints both on function names (provided the symbol files are available) or on a memory address. Also if source file is available the debugger will let you set breakpoints on specific lines using the format bX `filename:linenumber`

Viewing Assembly

In WinDbg you can use View->Disassembly option to open a window which will show you the disassembly of the current context. In ntsd you can use the u to view the disassembled code.

Stack operations

There are couple of things one usually does with the stack. One is to view the frames on the stack, so it can be determined which function called which one and what is the current context. This is done using the k command and its variations. The other common operation is to view the elements on the stack that are part of the current stack frame. The easiest way to do so is using db esp ebp, but it has its limitations. It assumes that the %ebp register actually points to the begining of the stack frame. This is not always true, since omission of the frame pointer is common optimization technique. If this is the case, you can always see what the %esp register is pointing to and start examining memory from that address.

The debugger also allows you to "walk" the stack. You can move to any stack frame using .frame X where X is the number of the frame. You can easily get the frame numbers using kn. Keep in mind that the frames are counted starting from 0 at the frame on top of the stack.

Reading and Writing to Memory

Reading memory is accomplished with the d* commands. Depending on how you want to view the data you use a specific variation of this command. For example to see the address to which a pointer is pointing, we can use dp or to view the value of a word, one can use dw. The help file says that one can view memory using ranges, but one can also use lengths to make it easy to display memory. For example if we want to see 0x10 bytes at memory location 0x77f75a58 you can either say db 77f75a58 77f75a58+10 or less typing gives you db 77f75a58 l 10.

Provided that you have symbols/source files, the dt is very helpful. It tries to find the data type of the sybol or memory location and display it accordingly.

Tips and tricks

Knowing your debugger can save you lots of time and pain in debugging either your own programs or when reverse engineering other's. Here are few things we find useful and time saving. This is not a complete list at all. If you know other tricks and want to contribute, let us know.

poi() - this command dereferences a pointer to give you the value that it is pointing to. Using this with user-defined aliases gives you convinient way of viewing data.

Example

FIXME: include better example

Let's set a breakpoint in on the function main
 0:000> bp main
 *** WARNING: Unable to verify checksum for test.exe
 
 Let's set a breakpoint in on the function main
 0:000> g
 Breakpoint 0 hit
 eax=003212e8 ebx=7ffdf000 ecx=00000001 edx=7ffe0304 esi=00000a28 edi=00000000
 eip=00401010 esp=0012fee8 ebp=0012ffc0 iopl=0         nv up ei pl zr na po nc
 cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000246
 test!main:
 00401010 55               push    ebp
 
 Enable loading of line information if available
 0:000> .lines
 *** ERROR: Symbol file could not be found.  Defaulted to export symbols for ntdll.dll - 
 Line number information will be loaded
 
 Set the stepping to be by source lines
 0:000> l+t
 Source options are 1:
      1/t - Step/trace by source line
 
 Enable displaying of source line
 0:000> l+s
 Source options are 5:
      1/t - Step/trace by source line
      4/s - List source code at prompt
 
 Start stepping through the program
 0:000> p
 *** WARNING: Unable to verify checksum for test.exe
 eax=003212e8 ebx=7ffdf000 ecx=00000001 edx=7ffe0304 esi=00000a28 edi=00000000
 eip=00401016 esp=0012fed4 ebp=0012fee4 iopl=0         nv up ei pl nz na po nc
 cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000206
 >    6:   char array [] = { 'r', 'e', 'v', 'e', 'n', 'g' };
 test!main+6:
 00401016 c645f072         mov    byte ptr [ebp-0x10],0x72 ss:0023:0012fed4=05
 0:000> 
 eax=003212e8 ebx=7ffdf000 ecx=00000001 edx=7ffe0304 esi=00000a28 edi=00000000
 eip=0040102e esp=0012fed4 ebp=0012fee4 iopl=0         nv up ei pl nz na po nc
 cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000206
 >    7:   int intval = 123456;
 test!main+1e:
 0040102e c745fc40e20100 mov dword ptr [ebp-0x4],0x1e240 ss:0023:0012fee0=0012ffc0
 0:000> 
 eax=003212e8 ebx=7ffdf000 ecx=00000001 edx=7ffe0304 esi=00000a28 edi=00000000
 eip=00401035 esp=0012fed4 ebp=0012fee4 iopl=0         nv up ei pl nz na po nc
 cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000206
 >    9:   test = (char*) malloc(strlen("Test")+1);
 test!main+25:
 00401035 6840cb4000       push    0x40cb40
 0:000> 
 eax=00321018 ebx=7ffdf000 ecx=00000000 edx=00000005 esi=00000a28 edi=00000000
 eip=00401051 esp=0012fed4 ebp=0012fee4 iopl=0         nv up ei pl nz na po nc
 cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000206
 >   10:   if (test == NULL) {
 test!main+41:
 00401051 837df800       cmp dword ptr [ebp-0x8],0x0 ss:0023:0012fedc=00321018
 0:000> 
 eax=00321018 ebx=7ffdf000 ecx=00000000 edx=00000005 esi=00000a28 edi=00000000
 eip=00401061 esp=0012fed4 ebp=0012fee4 iopl=0         nv up ei pl nz na po nc
 cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000206
 >   13:   strncpy(test, "Test", strlen("Test"));
 test!main+51:
 00401061 6848cb4000       push    0x40cb48
 0:000> 
 eax=00321018 ebx=7ffdf000 ecx=00000000 edx=74736554 esi=00000a28 edi=00000000
 eip=00401080 esp=0012fed4 ebp=0012fee4 iopl=0         nv up ei pl nz ac po nc
 cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000216
 >   14:   test[4] = 0x00;
 test!main+70:
 00401080 8b4df8           mov     ecx,[ebp-0x8]     ss:0023:0012fedc=00321018
 0:000> 
 eax=00321018 ebx=7ffdf000 ecx=00321018 edx=74736554 esi=00000a28 edi=00000000
 eip=00401087 esp=0012fed4 ebp=0012fee4 iopl=0         nv up ei pl nz ac po nc
 cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000216
 >   16:   printf("Hello RevEng-er, this is %s\n", test);
 test!main+77:
 00401087 8b55f8           mov     edx,[ebp-0x8]     ss:0023:0012fedc=00321018
 
 Display the array as bytes and ascii
 0:000> db array array+5
 0012fed4  72 65 76 65 6e 67                                reveng
 
 View the type and value of intval
 0:000> dt intval
 Local var @ 0x12fee0 Type int
 123456
 
 View the type and value of test
 0:000> dt test
 Local var @ 0x12fedc Type char*
 0x00321018 "Test"
 
 View the memory test points to manually
 0:000> db 00321018 00321018+4
 00321018  54 65 73 74 00                                   Test.
 
 Quit the debugger
 0:000> q
 quit:
 Unloading dbghelp extension DLL
 Unloading exts extension DLL
 Unloading ntsdexts extension DLL
 
 

Chapter 8. Executable formats

Table of Contents

Revision History
Revision $Revision: 1.6 $$Date: 2004/01/31 20:56:53 $ 

User-level Debugging

DDD

So at this point we now know how to write our programs on an extremely low level, and thus produce an executable file that very closely matches what we want. But the question is, how is our program code now actually stored on disk?

We'll begin our investigation into executable formats by looking at the ELF binary specification and related tools, and then move on to examine the PE binary specification, and related tools for dealing with those binaries.

Working with the ELF Program Format

Recall that when a Linux program runs, we start at the _start function, and move on from there to __libc_start_main, and eventually to main, which is our code. So somehow the operating system is gathering together a whole lot of code from various places, and loading it into memory and then running it. How does it know what code goes where?

The answer on Linux and UNIX is the ELF binary specification. ELF specifies a standard format for mapping your code on disk to a complete executable image in memory that consists of your code, a stack, a heap (for malloc), and all the libraries you link against.

So lets provide an overview of the information needed for our purposes here, and refer the user to the ELF spec to fill in the details if they wish. We'll start from the beginning of a typical executable and work our way down.

ELF Layout

There are three header areas in an ELF file: The main ELF file header, the program headers, and then the section headers. The program code lies in between the program headers and the section headers.

TODO: Insert figure here to show a typical ELF layout.

NOTE: ELF is extremely flexible. Many of these sections can be shunk, expanded, removed, etc. In fact, it is not outside the realm of possibility that some programs may deliberately make abnormal, yet valid ELF headers and files to try to make reverse engineering difficult (vmware does this, for example).

The Main ELF File Header

The main elf header basically tells us where everything is located in the file. It comes at the very beginning of the executable, and can be read directly from the first e_ehsize (default: 52) bytes of the file into this structure.

/* ELF File Header */
 typedef struct
 {
   unsigned char e_ident[EI_NIDENT];     /* Magic number and other info */
   Elf32_Half    e_type;                 /* Object file type */
   Elf32_Half    e_machine;              /* Architecture */
   Elf32_Word    e_version;              /* Object file version */
   Elf32_Addr    e_entry;                /* Entry point virtual address */
   Elf32_Off     e_phoff;                /* Program header table file offset */
   Elf32_Off     e_shoff;                /* Section header table file offset */
   Elf32_Word    e_flags;                /* Processor-specific flags */
   Elf32_Half    e_ehsize;               /* ELF header size in bytes */
   Elf32_Half    e_phentsize;            /* Program header table entry size */
   Elf32_Half    e_phnum;                /* Program header table entry count */
   Elf32_Half    e_shentsize;            /* Section header table entry size */
   Elf32_Half    e_shnum;                /* Section header table entry count */
   Elf32_Half    e_shstrndx;             /* Section header string table index */
 } Elf32_Ehdr;
 

The fields of interest to us are e_entry, e_phoff, e_shoff, and the sizes given. e_entry specifies the location of _start, e_phoff shows us where the array of program headers lies in relation to the start of the executable, and e_shoff shows us the same for the section headers.

The Program Headers

The next portion of the program are the ELF program headers. These describe the sections of the program that contain executable program code to get mapped into the program address space as it loads.

/* Program segment header.  */
 
 typedef struct
 {
   Elf32_Word    p_type;                 /* Segment type */
   Elf32_Off     p_offset;               /* Segment file offset */
   Elf32_Addr    p_vaddr;                /* Segment virtual address */
   Elf32_Addr    p_paddr;                /* Segment physical address */
   Elf32_Word    p_filesz;               /* Segment size in file */
   Elf32_Word    p_memsz;                /* Segment size in memory */
   Elf32_Word    p_flags;                /* Segment flags */
   Elf32_Word    p_align;                /* Segment alignment */
 } Elf32_Phdr;
 

Keep in mind that there are going to a few of these (usually 2) end-to-end (ie forming an array of structs) in a typical ELF executable. The interesting fields in this structure are p_offset, p_filesz, and p_memsz, all of which we will need to make use of in the code modification chapter.

The ELF Body

The meat of the ELF file comes next. The actual locations and sizes of portions of the body are described by the program headers above, and contain the executable instructions from our assembly file, as well as string constants and global variable declarations.

ELF Section Headers

The ELF section headers describe various named sections in an executable file. Each section has an entry in the section headers array, which is found at the bottom of the executable and has the following format:

/* Section header.  */
 
 typedef struct
 {
   Elf32_Word    sh_name;                /* Section name (string tbl index) */
   Elf32_Word    sh_type;                /* Section type */
   Elf32_Word    sh_flags;               /* Section flags */
   Elf32_Addr    sh_addr;                /* Section virtual addr at execution */
   Elf32_Off     sh_offset;              /* Section file offset */
   Elf32_Word    sh_size;                /* Section size in bytes */
   Elf32_Word    sh_link;                /* Link to another section */
   Elf32_Word    sh_info;                /* Additional section information */
   Elf32_Word    sh_addralign;           /* Section alignment */
   Elf32_Word    sh_entsize;             /* Entry size if section holds table */
 } Elf32_Shdr;
 
 
 

The section headers are entirely optional, however. A list of common sections can be found on page 20 of the ELF Spec PDF

Editing ELF

Editing ELF is often desired during reverse engineering, especially when we want to insert bodies of code, or if we want to reverse engineer binaries with deliberately corrupted ELF headers.

Now you could edit these headers by hand using the <elf.h> header file and those above structures, but luckily there is already a nice editor called HT Editor that allows you to examine and modify all sections of an ELF program, from ELF header to actual instructions. (TODO: instructions, screenshots of HTE)

Do note that changing the size of various program sections in the ELF headers will most likely break things. We will get into how to edit ELF in more detail when we are talking about actual code insertion, which is the next chapter.

ELF in Memory

FIXME: Describe what ELF looks like in memory on Linux (and maybe xBSD and Solaris). Include memory addresses and diagrams of the process space.

Working with the PE Program Format

Unfortunately, the PE format can be a bit of a maze, so we're only going to discuss those sections relevant to code modification, namely the Import Address Table and code sections. FIXME: what is the equiv of _start for Microsoft Windows. FIXME: How does COM use its directory entry?

So lets start with some basic concepts. There are essentially four ways to refer to a location in a PE image (and all are used by the various data structures in some form or another).

  1. Executable File Offset

    This is simply the offset from the beginning of the executable file.

  2. Relative Virtual Address (RVA)

    This is the offset from the base address of the executable image once it has been mapped into memory. Note that this is not the same as the executable file offset. Various sections within the executable file have to start on page aligned boundaries, and as a result, holes must be present in memory that are not present in the disk image. This is a big source of confusion when initially working with PE's.

  3. Section Offset

    This is the offset from whatever data structure or section you are currently in. Yes, in some cases it is used in lieu of the RVA. (FIXME: Build a semi-complete list of these).

  4. Virtual Address

    This is a full pointer into the address space of the process. Virtual addresses can be obtained by the formula

    VA = RVA + base address

    . For almost all executables, the base address is 0x400000. For DLL's, the base address can vary, as the address requested by the DLL in its header is much more likely to conflict with another DLL. The accompanying source code (FIXME: Clean up and Link in) for function injection contains a function that converts RVA's to VA's or file offsets, depending upon the context.

PE Headers

So there are three main header types to a PE file. It's going to be a lot to keep track of. Luckily, the PEView utility can make things much more easy to understand. We suggest using it on an executable and following along so you don't get lost. These headers and the supporting macros are all defined in WinNT.h in the PlatformSDK include directory.

Figure 8.1. PEView Executable Viewer

PEView Executable Viewer

IMAGE_DOS_HEADER

The first is the IMAGE_DOS_HEADER. It is at file offset and RVA 0.

Figure 8.2. IMAGE_DOS_HEADER

typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
     WORD   e_magic;                     // Magic number
     WORD   e_cblp;                      // Bytes on last page of file
     WORD   e_cp;                        // Pages in file
     WORD   e_crlc;                      // Relocations
     WORD   e_cparhdr;                   // Size of header in paragraphs
     WORD   e_minalloc;                  // Minimum extra paragraphs needed
     WORD   e_maxalloc;                  // Maximum extra paragraphs needed
     WORD   e_ss;                        // Initial (relative) SS value
     WORD   e_sp;                        // Initial SP value
     WORD   e_csum;                      // Checksum
     WORD   e_ip;                        // Initial IP value
     WORD   e_cs;                        // Initial (relative) CS value
     WORD   e_lfarlc;                    // File address of relocation table
     WORD   e_ovno;                      // Overlay number
     WORD   e_res[4];                    // Reserved words
     WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
     WORD   e_oeminfo;                   // OEM information; e_oemid specific
     WORD   e_res2[10];                  // Reserved words
     LONG   e_lfanew;                    // File address of new exe header
   } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
     

The IMAGE_DOS_HEADER is a relic from the (you guess it) DOS days. It's only function now the e_lfanew field, which gives the offset to the of the IMAGE_NT_HEADERS structure, which is where the real action starts. Technically this is a file offset, but at this point, allignment isn't an issue, and it can be intrepreted as an RVA as well. One other thing to note: e_magic has the value IMAGE_DOS_HEADER, which is "MZ" or 0x5A4D.

IMAGE_NT_HEADERS

Next up on the chopping block is the IMAGE_NT_HEADERS:

Figure 8.3. IMAGE_NT_HEADERS

typedef struct _IMAGE_NT_HEADERS {          
     DWORD Signature;
     IMAGE_FILE_HEADER FileHeader;
     IMAGE_OPTIONAL_HEADER32 OptionalHeader;
 } IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32; 
      

Not really much of interest at this level. That Signature field is just to make sure you're in the right place, and has the value IMAGE_NT_SIGNATURE, which is PE00, or 0x00004550. If we open up that IMAGE_FILE_HEADER:

Figure 8.4. IMAGE_FILE_HEADER

 typedef struct _IMAGE_FILE_HEADER {
     WORD    Machine;         
     WORD    NumberOfSections;
     DWORD   TimeDateStamp;
     DWORD   PointerToSymbolTable;           
     DWORD   NumberOfSymbols; 
     WORD    SizeOfOptionalHeader;
     WORD    Characteristics;
 } IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
      

This essentially is only useful (for our purposes) for knowing the number of sections and the size of the optional header (which isn't very optional). The optional header size is needed because there can be a varying number of directory entries in the header, and this size is used to find the start of the section table (with the IMAGE_FIRST_SECTION() macro).

Figure 8.5. IMAGE_OPTONAL_HEADERS

#define IMAGE_NUMBEROF_DIRECTOR_ENTRIES 16
 
 typedef struct _IMAGE_OPTIONAL_HEADER {
     WORD    Magic;
     BYTE    MajorLinkerVersion;
     BYTE    MinorLinkerVersion;
     DWORD   SizeOfCode;
     DWORD   SizeOfInitializedData;
     DWORD   SizeOfUninitializedData;
     DWORD   AddressOfEntryPoint;
     DWORD   BaseOfCode;
     DWORD   BaseOfData;
     DWORD   ImageBase;
     DWORD   SectionAlignment;
     DWORD   FileAlignment;
     WORD    MajorOperatingSystemVersion;
     WORD    MinorOperatingSystemVersion;
     WORD    MajorImageVersion;
     WORD    MinorImageVersion;
     WORD    MajorSubsystemVersion;
     WORD    MinorSubsystemVersion;
     DWORD   Win32VersionValue;
     DWORD   SizeOfImage;
     DWORD   SizeOfHeaders;
     DWORD   CheckSum;
     WORD    Subsystem;
     WORD    DllCharacteristics;
     DWORD   SizeOfStackReserve;
     DWORD   SizeOfStackCommit;
     DWORD   SizeOfHeapReserve;
     DWORD   SizeOfHeapCommit;
     DWORD   LoaderFlags;
     DWORD   NumberOfRvaAndSizes;
     IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
 } IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
 
      

So the interesting fields here are ImageBase, AddressOfEntryPoint, SectionAlignment, and the CheckSum (FIXME: do they ever use this?). After that comes the DataDirectory, which contains basicaly the root of an entire filesystem heirarchy within the executable. Luckily most executables only use the top level of this filesystem.

Figure 8.6. IMAGE_DATA_DIRECTORY

typedef struct _IMAGE_DATA_DIRECTORY {
     DWORD   VirtualAddress; // RVA, NOT VA!
     DWORD   Size;
 } IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
     

So the array of these is what makes up the end of the optional header. Don't be fooled by that name, it is in fact a relative virtual address. Lets have a look at the data structure for the second array index (ie index 1). Going to this RVA will take us to the import directory, which is an array of DLL's that this PE file is linked against.

Figure 8.7. IMAGE_IMPORT_DIRECTORY

 typedef struct _IMAGE_IMPORT_DESCRIPTOR {
   union { 
       DWORD   Characteristics;    // 0 for terminating null import descriptor
       DWORD   OriginalFirstThunk; // RVA to original unbound IAT (PIMAGE_THUNK_DATA)
   };
   DWORD   TimeDateStamp;          // 0 if not bound,
                                   // -1 if bound, and real date\time stamp
                                   //     in IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT (new BIND)
                                   // O.W. date/time stamp of DLL bound to (Old BIND)
         
   DWORD   ForwarderChain;         // -1 if no forwarders
   DWORD   Name; 
   DWORD   FirstThunk;             // RVA to IAT (if bound this IAT has actual addresses)
 } IMAGE_IMPORT_DESCRIPTOR;
 typedef IMAGE_IMPORT_DESCRIPTOR UNALIGNED *PIMAGE_IMPORT_DESCRIPTOR;
 
 
     

So these fields can be a bit confusing, mostly because at this point, the PE format itself gets a bit confusing. First off, the OriginalFirstThunk is an RVA, as is Name and FirstThunk. So this is important. Name isn't a pointer to an ASCII string, but an RVA to one, which you must convert. The last import descriptor in the file will have zeroes for all of its fields and it serves as an end marker.

As the comments indicate, there are in fact two import tables for each DLL. When non-zero, OriginalFirstThunk refers to the "Unbound" Import Table. FirstThunk, on the other hand, refers to the Import Address Table.

The Import Tables

One data structure describes both of these import tables:

Figure 8.8. IMAGE_THUNK_DATA

typedef struct _IMAGE_THUNK_DATA64 {            
     union {  
         ULONGLONG ForwarderString;  // PBYTE    
         ULONGLONG Function;         // PDWORD   
         ULONGLONG Ordinal;                      
         ULONGLONG AddressOfData;    // PIMAGE_IMPORT_BY_NAME
     } u1;    
 } IMAGE_THUNK_DATA64;
 typedef IMAGE_THUNK_DATA64 * PIMAGE_THUNK_DATA64;
 
     

In many executables files, both of these tables look exactly the same. The reason for this is that the IAT is rewritten when the executable is loaded into memory, and at that point contains the actual addresses of imported functions (hence the Function member of the union). It then servers a purpose similar to the PLT we saw in the ELF format. As the name OriginalThunk suggests, the "Unbound" Import Table remains unchanged, to preserve function name and ordinal information (to what end, no one seems to really know. FIXME: Maybe LoadLibrary uses this to remap functions if DLL's need to be shuffled around?). In some executables, the IAT can come pre-bound, and filled in with expected addresses already as an efficiency mechanism.

But, the complexities don't end there. An imported function can be listed by name, or it can be listed by an ordinal number which represents its position in the DLL's export table. If it is listed by an ordinal number, high bit (1<<31, or 0x80000000) of this union is set. If it is not set, then it is either an RVA to a forwarder string, a function address (which is a Virtual Address), or an RVA to a structure that contains the function name (FIXME: How are these three differentiated?)

The Rest

So there's quite a bit more to the PE format than we discussed here. Luckly, most if it isn't directly relevant. Our ticket to paradise is that Function member field of the Import Address Table. Modifying him will allow us to drop in any old arbitrary function we want, as we will cover in the Code modification Chapter. Things you might want to cover if you're curious are the export tables and the COM data directory. (FIXME: Consider documenting COM tables).

PE in Memory

FIXME: Describe what PE looks like in memory. Include memory addresses, and a diagram of process space.

Chapter 9. Code Modification

So now we know the tools to analyze our programs and find functions of interest to us even in programs without source code. We can understand the assembly that makes them up, and can write assembly of our own to do what we want. We know how a program looks on the disk and how that corresponds to what the program looks like in memory. Knowledge is power, and we know a lot.

Reasons for Code Modification

Code modification is most useful if we wish to change the behavior of programms for which we do not have source code on hand. It is also handy when trying to skirt copy protection of various kinds.

Library Hooking

LD_PRELOAD

This is an environment variable that allows us to add a library to the execution of a particular program. Any functions in this library automatically override standard library functions. Sorry, you can't use this with suid programs.

Example:

% gcc -o preload.so -shared preload.c -ldl

% LD_PRELOAD=preload.so ssh students.uiuc.edu

Instruction Modification

Since the smallest unit of code is the instruction, it follows that the simplest form of code modification is instruction modification. In instruction modification, we are looking to change some property of a specific instruction. Recall from the assembly section that each instruction has 2 parts: The mnemonic and the arguments. So our choices are limited.

The best way to modify instructions is through HT Editor, which was mentioned earlier in the ELF section. HTE has a hex editor mode where we can edit the hex value of an instruction and see the assembly updated in real time. (TODO: instructions, screenshots of HTE)

Editing the arguments

Editing the arguments of an assembly instruction is easy. Simply look at the hex value of the assembly instruction's argument, and see where it lies in the hex bytes for that instruction. HTE will allow you to overwrite these values with values of your own. (Be careful with byte ordering!). TODO: Example1.

Editing the Mnemonic

This is far more tricky.

Single Instruction Insertion

Single Function Insertion

Use unused space as found by disasm.pl (be careful about main)

Multiple Function Insertion

Trickery.. We're working on a util to modify ELF programs and insert functions. What about using MMAP?? (P.S. Can you unmap executable memory to modify it... if they are doing an MD5 of their executable)

Attacking copy protection

Lest I be accused of hiding in my ivory tower, lets look a concrete application of these ideas, and some techniques (:

Chapter 12. Extra Resources

Table of Contents

Revision History
Revision $Revision: 1.3 $$Date: 2004/01/31 20:56:53 $ 

Other Resources and amusements

  1. LDasm project. LDasm is at best a passable disassembly tool (disasm.pl is FAR more useful), but it does come with a utility called ptrace, which allows you to view which instructions of a program actually execute. You can also give ptrace a list of addresses (for example, the list of functions found by disasm.pl) and have it step through those to show you which ones actually execute in your program.

  2. Creating Teensy Executables in Linux

  3. Microsoft COFF format

  4. Attacking FlexLM is an essay written in 1998 on attacking a specific form of hard copy protection. There are several other essays on that site, but most of them cover material that we cover above, but with specific example programs.