Inguma - A Free Penetration Testing and Vulnerability Research Toolkit

OpenDis - A Framework for Binary Static Analysis

OpenDis borns as a tool to make easier the life of a security researcher looking for vulnerabilities in binaries. Many tools exists to accomplish the same thing but the vast majority of these are closed source (at least) and, commonly, very expensive commercial tools.

Index

  1. How to use OpenDis
    1. Description of the arguments
    2. Example usage
    3. Sample output
    4. Sample report
  2. How to use the Framework
  3. How to use the Framework's API
    1. Example usage of the API

How to use OpenDis

The main tool is called "dis.py" (very original). You will find the script in $INGUMA_DIR/dis. The following is the output when you invokes dis.py without passing parameters:

Usage:
./dis.py <binary file> [options]

Options:
-s Save the database to -s=file
-p Generate pseudo-code instructions
-r Generate a report of dangerous functions
-v Don't search for variables
-a Don't search for arguments
-j Report interesting calls (call *%eax, jmp *%ebx, etc..)
-ph Enable the Free Pascal hack
-mh Enable the 'only code before main' hack
-i Ignore .rodata section
-rh Don't search afterward for the 0x00 when reading constants
-d Decompile all, even uninteresting functions
-f Disassemble object file in format -f=<format> (i.e., objdump -b elf32-arm)
-c Assume that CPU is of type -c=CPU (i.e., -c=x86)

Description of the arguments

-s: If you're interested in writting your own scripts using the OpenDis framework you will need, first, to save the binary in the OpenDis database format. Use the -s=filename argument to create a database.
-p: Translates assembler code to pseudo-c format. It doesn't work properly.
-r: Check for the most common error prone blocks, calls, etc... Will print an alert (CHECK:) highlighting the possible vulnerable assembly blocks.
-v: OpenDis will try to find variables, showing these as local_var_size, where size is the size of the variable. This flag disables the feature.
-a: OpenDis tries to find function arguments as well as variables althought it doesn't work so well at the moment. Using this flag disables the feature.
-j: Report the most interesting calls found in the assembly code. When developing exploits for an overflow, format, etc... you will need to find points to jump to, in example, if your shellcode is stored in %ebx, you need to find a way to do call *%ebx or jmp *%ebx. OpenDis reports all the most interesting calls and if these addresses appears to be usables or not.
-ph: The FreePascal hack. Shows only code before pascalmain removing many uninteresting code, not written by the developer but by the fpc compiler.
-mh: Only shows code found prior to the function main. Commonly, compiler generated code goes after the main function. Consider it a hack that sometimes works.
-i: If you're not interested in the constants found in the binary you can specify this flag to ignore the .rodata section.
-rh: When looking for the text of a constant sometimes OpenDis fails to find the valid start offset. This hack will look afterward in the .rodata section for the 0x00 hexadecimal character considering it the start of the constant.
-d: OpenDis tries to remove many compiler generated functions. If you're interested in taking a look to these functions you can use this flag. Not so util, IMHO.
-c: Very usefull when you're decompiling binaries not for x86 processors. Specify the processor the binary was compiled for and OpenDis (and internally objdump and nm) will take specific actions for these kind of processors. The toolkit supports "advanced" options for x86, sparc and avr processors.

Example usage


Imagine the following very vulnerable C program:

#include <stdio.h>

void foo (char *arg)
{
char buf[10];

    strcpy(buf, arg);
    printf(buf);
}

int main(int argc, char *argv[])
{
    if (argc > 1)
        foo(argv[1]);
}

OpenDis will generate the following output when invoked passing as the unique argument the name of the compiled binary file:
;
; File generated by OpenDis - C Disassembler & Future Decompiler ;)
; $Id: dis.py,v 1.9 2007/11/21 11:42:16 joxean Exp joxean $
; Disassembly code for 'vuln'
;

PROCEDURE foo AT ADDRESS 0x08048398
BEGIN ASM
  foo:
    0x08048398: push %ebp
    0x08048399: mov    %esp,%ebp
    0x0804839b: sub    $0x18,%esp             ;24
    0x0804839e: mov    func_argument_2,%eax
    0x080483a1: mov    %eax,func_argument_1
    0x080483a5: lea    local_size_10,%eax
    0x080483a8: mov    %eax,(%esp)
    0x080483ab: call 080482e8 <strcpy@plt>
    ; CHECK: Usage of strcpy (Hit 1)

  foo+0x18:
    0x080483b0: lea    local_size_10,%eax
    0x080483b3: mov    %eax,(%esp)
    0x080483b6: call 080482d8 <printf@plt>
    ; CHECK: Usage of printf (Hit 2)

  foo+0x23:
    0x080483bb: leave
    0x080483bc: ret

END ASM;

PROCEDURE main AT ADDRESS 0x080483bd
BEGIN ASM
  main:
    0x080483bd: push %ebp
    0x080483be: mov    %esp,%ebp
    0x080483c0: sub    $0x8,%esp
    0x080483c3: and    $0xfffffff0,%esp             ;4294967280L
    0x080483c6: mov    $0x0,%eax
    0x080483cb: add    $0xf,%eax
    0x080483ce: add    $0xf,%eax
    0x080483d1: shr    $0x4,%eax
    0x080483d4: shl    $0x4,%eax
    0x080483d7: sub    %eax,%esp

  ;
  ; Start: Programmers code starts here?
  ;
  main+0x1a:
    0x080483d9: cmpl $0x1,argc
    0x080483dd: jle    080483ef <main+0x32>

  main+0x22:
    0x080483df: mov    argv,%eax
    0x080483e2: add    $0x4,%eax
    0x080483e5: mov    (%eax),%eax
    0x080483e7: mov    %eax,(%esp)
    0x080483ea: call 08048398 <foo>

  main+0x32:
    0x080483ef: leave
    0x080483f0: ret

END ASM;

As you can see the code is more readable that raw assembly. Next, try some "advanced" feature of OpenDis, the automatic detection of vulnerable/error prone constructions. Invoke the script "dis.py" passing the argument -r. Notice the extralines at the bottom of the output:
;
; Report of presumable error prone blocks
;
;
; 1) Usage of strcpy in function foo:
;
; 0x08048398: push %ebp
; 0x08048399: mov %esp,%ebp
; 0x0804839b: sub $0x18,%esp ;24
; 0x0804839e: mov func_argument_2,%eax
; 0x080483a1: mov %eax,func_argument_1
; 0x080483a5: lea local_size_10,%eax
; 0x080483a8: mov %eax,(%esp)
; 0x080483ab: call 080482e8 <strcpy@plt>
;
; Analysis: Overflow in statically sized buffer if greater than 10
;
;
;
; 2) Usage of printf in function foo:
;
; 0x080483b0: lea local_size_10,%eax
; 0x080483b3: mov %eax,(%esp)
; 0x080483b6: call 080482d8
;
;
; Analysis: First parameter of the printf call is not a format string. Check for format strings.
;
;
; Total of 2 hit(s). Happy hunting!

Is not cool? The detection of these constructions, at least, in small programs is very easy. In the first assembly block, the strcpy overflow, we can find the instruction lea local_size_10, %eax, which specifies the size of some statically sized array. Two lines afterward the call to strcpy is made. If we can pass an argument larger than 10 (local_size_10) we will overflow the buffer of our vulnerable program.
In the second block OpenDis found that the printf call is made with a variable. It may (or may not) be a vulnerability, as is in our example.

Well, we have 2 vulnerbilities in our code, and these vulnerabilities were found by OpenDis. In large static analysis projects you will find more false positives than real ones but, well, at least OpenDis points us where to start finding vulns.

How to use the Framework

As of version 0.0.6 of Inguma, you will find 2 usage examples of the API: dbprint.py and asmdiff.py. The first script (dbprint.py) takes as the first parameter an OpenDis format database and prints to stdout the complete assembler code.

The second example (and the most interesting one) tries to find differences between 2 binary versions of the same program, library or object file. I use it very 3 months, in example, to known what changes were made to Oracle patches (the CPU, Critical Patch Update) with more or less luck.

Using the Framework's API

OpenDis databases are simple [c]pickle format objects which contains many classes stored in the asmclasses.py file so, to use the API, at the top of your script add the following 2 Python code lines:

import pickle # Or cpickle if you prefer
import asmclasses # The framework's classes


Next, open the database file and load the pickle object. The call will return 2 objects: the .rodata section (class CRoData) and the complete program (class CProgram).

#
# The database you will load will be in the following format
#
# 1) Raw data found in the .rodata section
# 2) The whole program in Python structures
#
rodata, obj = pickle.load(f)

print "Section .rodata: %s" % hex(rodata.address)
print "-"*80
print repr(rodata.data)
print "-"*80
print


The obj object is a list of the program's functions (returning CProcedure type objects) so you can iterate over all functions, search specific functions, etc... Every procedure will have their respective assembly lines and you can iterate over all lines using the lines property of the CProcedure's class. Every procedure, also, has the following properties:

  1. name: The function's name
  2. address: The function's address
  3. lines: Function's lines
  4. vars: The variables found in the function
  5. params: The parameters found in the function
  6. startAddress: Starting address of the function
  7. endAddress: Ending address of the function

While iterating over all lines in a procedure you will need to known what is the corresponding assembly code, label position (i.e., main+0x12), etc... The following are the available properties of the CLine class:

  1. address: Address of the line
  2. code: The assembly code (string)
  3. label: Label of the line (i.e., foo+0x16)
  4. description: If a constant was used in the line, the value of it


With these objects and properties you can easily write scripts to search for specific constructions, calls, etc... in about seconds. The following full example simply prints the complete assembly code given a database:

 #!/usr/bin/python
 
 """
 Example usage of the ASM Classes library and OpenDis framework
 A part of the Inguma Project
 """
 
 import sys
 import pickle
 import asmclasses
 
 def printProgram(database):
     f = file(database, "r")
     
     #
     # The database you will load will be in the following format
     #
     # 1) Raw data found in the .rodata section
     # 2) The whole program in Python structures
     #
     rodata, obj = pickle.load(f)
 
     print "Section .rodata: %s" % hex(rodata.address)
     print "-"*80
     print repr(rodata.data)
     print "-"*80
     print 
 
     showLabel = False
     #
     # Iterate over all procedures in the program (List object)
     #
     for proc in obj:
         print "PROCEDURE", proc.name, "AT 0x" + proc.address
 
         #
         # Iterate over all lines in the procedure
         #
         for line in proc.lines:
             if showLabel:
                 print "  %s:"  % line.label
             if line.description:
                 mtype = str(type(line.description)).split("'")[1]
                 print "\t", "0x" + line.address.ljust(8) + ":", line.code.ljust(30), ";", mtype.ljust(4) + ":", repr(line.description)
             else:
                 print "\t", "0x" + line.address.ljust(8) + ":", line.code.ljust(30)
 
             # Current Instruction (Assume is x86)
             instructions = line.code.split(" ")
 
             if instructions[0].find("j") == 0 or instructions[0].find("call") > -1:
                 print
                 showLabel = True
             else:
                 showLabel = False
 
         print "END PROCEDURE"
         print
 
 def usage():
     print "Example OpenDis framework's API usage"
     print
     print "Usage:", sys.argv[0], "<opendis format database>"
     print
 
 def main():
     if len(sys.argv) == 1:
         usage()
         sys.exit(0)
     else:
         printProgram(sys.argv[1])
 
 if __name__ == "__main__":
     main()
 
 


Copyright (c) 2007 Joxean Koret