Hybridizing Assembly Retrieval

Most disassembly tools perform either a linear sweep retrieval or a recursive traversal retrieval. Linear sweep starts at the beginning of each executable section and disassembles from the first offset, continuing to the offset following the end of the retrieved instruction. Recursive traversal has a formal definition, but put simply, it performs piece-wise linear sweep over a series of program blocks, or contiguous (non-branching) instruction segments. When a branch instruction is discovered, an attempt is made to determine the target and, if any are found, each target is recursively disassembled. I mentioned in an earlier post that this tool can handle aliased instructions by intentional design. This ability affords us several benefits as a disassembler, one of which is ability to perform both linear sweep and recursive traversal simultaneously.
Continue reading

Semantic Representation of Assembly Architectures

It is a very common need in areas such as binary translation, program analysis, and compilers to represent the low level assembly instructions in a more abstract semantic representation. This semantic representation is often referred to as an Intermediate Language or Representation (IL/IR). These IRs provide a standardized method of performing the same operations over many different types of architectures without having to write separate operations for every supported architecture. For example, an optimizing compiler can contain a single optimization procedure that operates on an IR rather than many procedures, each having to handle the quirks of the architecture they were targeting.
Continue reading

Static Analysis of Applications

Static analysis extracts as much information as possible without actually executing the application. In many ways it is similar to reverse engineering. Analysts want to determine the nature of a suspicious application, and this requires understanding what it does and how it works. Even without source code there are several ways to extract relevant information from compiled Android applications. This article will discuss static analysis techniques, comprehensive tools for static analysis, and which problems remain unsolved.
Continue reading

The Big Picture of Malware Analysis

In order to show how machine learning fits into the big picture of malware analysis, we obviously need to first know what this picture is.  While there are many different ways of looking at it, I am going to approach it by first defining who the various classes of “analysts” are (the reason for the quotes will become apparent shortly), the tasks they perform, and the knowledge they need to generate.

Continue reading