Robustly Parsing the PE Header

The first thing needed in order to disassemble a program is (obviously) a place to start. For Windows executables, this is usually discovered by parsing the PE header of the executable. This well known, well documented data structure contains the information about the executable that the program loader will need to execute it. This information includes things such as a mapping of the program’s file contents to memory, which libraries need to be loaded into memory, and what we are looking for, the entry point of the program.
Continue reading

Hybridizing Assembly Retrieval

Most disassembly tools perform either a linear sweep retrieval or a recursive traversal retrieval. Linear sweep starts at the beginning of each executable section and disassembles from the first offset, continuing to the offset following the end of the retrieved instruction. Recursive traversal has a formal definition, but put simply, it performs piece-wise linear sweep over a series of program blocks, or contiguous (non-branching) instruction segments. When a branch instruction is discovered, an attempt is made to determine the target and, if any are found, each target is recursively disassembled. I mentioned in an earlier post that this tool can handle aliased instructions by intentional design. This ability affords us several benefits as a disassembler, one of which is ability to perform both linear sweep and recursive traversal simultaneously.
Continue reading