Automated Software Debugging

[ Home | Download | User Guide | Publications | Acknowledgement ]

User Guide Table of Contents

1. Prerequisites and Download
2. Extracting and Building the Tools
3. Debugging the bc-1.06 program
      3.1 Understanding the bug
      3.2 Running the Automated Tools and Interpreting the Outputs
4. Debugging your own programs
5. Known problems/bugs
6. Questions and Feedback

1. Prerequisites and Download

1) Our automated debugging tool collects data about the program using binary instrumentation. We use the PIN binary instrumentation platform. Therefore in order to use our binary instrumentation tools (pin-tools), you have to install PIN first. The latest copy of PIN can be obtained here.

2) The PIN platform and our pin-tools should be able to work on Windows, Linux and MacOS. However, we have developed and tested our tools on Linux only. Therefore, for the rest of this document, we assume a Linux OS. If you want to use our tools on Windows, you should have to modify only some of the automation scripts (we explain those scripts in detail later in this document, but they are basically used to run the program repeatedly for the debugging process).

3) Download our pin-tools and scripts as well as the example bug using the download link on this page.

2. Extracting and Building the Tools

1) The two files downloaded from the previous step should be: AutoDebug.tgz and bc-1.06_BUG.tgz.
Place AutoDebug.tgz in the source/tools/ directory of PIN ( where the PIN source code examples reside).
Extract AutoDebug.tgz: tar -xzvf AutoDebug.tgz
This should create a directory AutoDebug/, which contains the source code for our pin-tools.
To build the pin-tools:
cd CHECK
make
cd ..
make
This should create a folder obj-ia32 where the compiled pin-tools reside.

2) Create folder for the example buggy program and copy bc-1.06_BUG.tgz into that folder. Extract bc-1.06_BUG.tgz, which creates a folder bc-1.06_BUG/. Inside bc-1.06_BUG/ there is a folder src/ which contains the source code for the buggy program. Change into the source folder and compile the bc program. Rename the executable to my_bc and move it back into bc-1.06_BUG/ folder.
cd src
make
mv bc ../my_bc

Once we build the bc program, we should also disassemble the executable. This is necessary, since we are going to be looking at the assembly code during our debugging process.
objdump --line-numbers --source --disassemble-all my_bc > my_bc.dasm

To automate the process, we use several scripts, which reside in the bc-1.06_BUG/ directory. Those scripts have to be updated to use the proper location of PIN. Search for $PIN in the scripts and update it to point to your installation of PIN. In my case, the PIN variable was set to:
my $PIN = "/home/dimitrov/Spring2009/pin-2.5-24110-gcc.4.0.0-ia32_intel64-linux/pin";

3. Debugging the bc-1.06 program

3.1 Understanding the bug

BC is a program, which implements an arbitrary precision calculator language. The bug, which we are going to investigate is a memory corruption bug and the relevant code from file storage.c is shown below. the faulty code in function more_arrays(). This function is called when more storage needs to be allocated to an array. It allocates a new, larger array, copies the elements of the old array into the new one, and initializes the remaining entries of the new array to NULL. The defect is on line 18 and is due to the fact that a variable v_count is used mistakenly instead of the correct variable a_count. Thus, whenever v_count happens to be larger than a_count, the buffer arrays will be overflown and its size information, which is located right after the buffer, will be lost. This results in a segmentation fault when more_arrays() is called one more time, and the buffer with corrupted size information is freed at line 23.

1     void more_arrays () {
2        int indx;    int old_count;
3        bc_var_array **old_ary;
4
5        /* Save the old values. */
6        old_count = a_count;
7        old_ary = arrays;
8
9        /* Increment by a fixed amount and allocate. */
10      a_count += STORE_INCR;
11      arrays = (bc_var_array **) bc_malloc (a_count*sizeof(bc_var_array *));
12
13      /* Copy the old arrays. */
14      for (indx = 1; indx < old_count; indx++)
15         arrays[indx] = old_ary[indx];
16
17      /* Initialize the new elements. defect: incorrect loop condition */
18      for (; indx < v_count; indx++){
19         arrays[indx] = NULL;     /* infection: overflows its size information */
20
21      /* Free the old elements. */
22      if (old_count != 0){
23        free (old_ary);    /* crash: when the buffer size is corrupted */
24      }
25   }

3.2 Running the Automated Tools and Interpreting the Outputs

Copy the pin-tools (the .so files) from their obj-ia32/ directory to the bc-1.06_BUG/ directory.
Then execute the script:
pin_RUN_split_diduce.pl

This script will execute the bc program repeatedly. First it will execute the program with passing inputs and record certain properties of the program during each execution. Those properties are accumulated/updated during the passing runs in file named: trained.0.diduce_dump.txt

After the training step, the script will run bc with a failing input (an input which results in a segmentation fault) and our pin-tool will detect the execution anomalies that occur during the failing run. Notice the output file: diduce.1.unique.txt. This file contains ALL the execution anomalies that our tool detected. On my machine, and for my executable (compiled with -static option), the number of anomalies are 24. Each line of this file begins with "pc" or "NEW_pc". Each line corresponds to an assembly instruction in the program, for which our tool detected anomalous behavior. The instructions "NEW_pc" are marked anomalous because they were never executed during any of the training runs, thus they are simply new code. The rest of the instructions showed anomalous behavior as compared to the passing runs.

The file diduce.1.CrashToken.txt contains a (usually small) subset of all the anomalies. Those anomalies are related to the instruction which crashed the program, through data dependencies. Thus, they are much more likely to contain the root cause of the bug. We refer to this set of anomalies as the "isolated anomalies" - since we isolate only the relevant anomalies and discard the irrelevant ones. In my case I have 3 isolated anomalies.

At this point, if only a few anomalies were isolated, the programmer may search for them in the assembly code and determine to which C/C++ instruction they actually correspond. For example, the first anomaly in my diduce.1.CrashToken.txt file is for pc=0x804d9c0. Searching for that pc in the assembly, we find:
arrays[indx] = NULL; /*infection: overflows its size information */
804d9c0: c7 04 91 00 00 00 00 movl $0x0,(%ecx,%edx,4)
This is exactly the point where memory is corrupted due to the buffer overflow and thus we have successfully detected the root cause of the problem.

Sometimes, even after isolating only the relevant anomalies, there are still too many of them to be examined manually. Thus, we have developed another (optional) step in our automated approach. In this step, which we call "validation", we test each of the isolated anomalies by attempting to automatically "fix" them. Our way of "fixing" is simple. During program execution, we skip the anomalous instruction, thus preventing it from corrupting memory. Our experiments show that this simple approach is surprisingly effective and can reduce the number of anomalies to be examined. To perform the validation step, simply execute the script:
validate_split.pl

After running this script, you should obtain a file: CrashToken.0.diduce.validated. This file contains a summary of the validation step. It classifies the isolated anomalies as: validated, unresolved and dismissed. Those ranked as validated are the most likely root cause of the bug, because skipping these instructions causes the program crash to disappear. In this particular case, the instruction 0x804d9c0 is classified as validated.

4. Debugging Your Own Programs and Beyond

The process of debugging your own programs should be similar to debugging bc. What needs to be done, is that you need to modify the scripts to run your program. For example, in the pin_RUN_split_diduce.pl script, there is a section which defines how to run the program. The code looks as shown below. This code defines how to perform the training for program bc and also how to trigger the bug. Simply replace the command after "--" with your own commands. The same has to be done for the script RUN_delete_dynamic, which is part of the validation step. Replace the command after "--" with your own.

my @training_set = (
"-- my_bc inputs/Test/array.b",
"-- my_bc inputs/Test/arrayp.b",
"-- my_bc inputs/Test/aryprm.b",
"-- my_bc -l inputs/Test/atan.b",
"-- my_bc -l inputs/Test/checklib.b",
"-- my_bc inputs/Test/my_div.b",
"-- my_bc -l inputs/Test/exp.b",
"-- my_bc inputs/Test/fact.b",
"-- my_bc -l inputs/Test/jn.b",
"-- my_bc -l inputs/Test/ln.b",
"-- my_bc inputs/Test/mul.b",
"-- my_bc inputs/Test/raise.b",
"-- my_bc inputs/Test/signum",
"-- my_bc -l inputs/Test/sine.b",
"-- my_bc inputs/Test/sqrt1.b",
"-- my_bc inputs/Test/sqrt2.b",
"-- my_bc inputs/Test/sqrt.b"
);

my @fail_set = ("-- my_bc inputs/bad.b");

The process described in this document works automatically for programs, which crash. During a crash our pin-tool will automatically take control and isolate the anomalies relevant to the crash point. However, if the program does not crash but simply results in incorrect results, some more effort is required. We have to determine, where is the point of program failure (for example, the point where the incorrect result is being printed out) . Once we determine an instruction (pc), which should be considered the point of failure, we can specify it to our pin-tool by using the TOKEN_PC option, when running the pin-tool. (note that the pc has to be supplied as an unsigned integer value on the command line and not as a hex value. )

We have provided more scripts and pin-tools, than what we describe in this brief user guide. Those pin-tools implement different methods for detecting anomalies during program execution (namely the AccMon and Loop-count method, as described in our publication.) The process for using them is the same as the one currently described. The script for using them are also provided.

4. Known Problems/Bugs

Since we are instrumenting (using dynamic binary translation) a program, which corrupts memory, we have experienced that sometimes the buggy program may cause PIN itself to crash, giving an error: C:Tool (or Pin) caused signal 11 at PC 0x128a013. Sometimes we can prevent this from happening by simply running the program on a different machine, or recompiling it with different options (such as with -static). At this point we still do not have a solution to this problem, but are communicating to the PIN developers about that issue.

Newer versions of the gcc compiler may use a modified version of malloc(), and actually attempt to catch some memory corruption bugs. In this case, the crash (for example due to double-free) will be intercepted by glibc instead of our pin-tool. If you want to disable the malloc() checks, then set the environment variable: export MALLOC_CHECK_=0 or export MALLOC_CHECK_=2

5. Questions and Feedback

If you have questions about how to use the tool or to leave any feedback, please send us email at slice4e AT gmail.com or zhou AT eecs.ucf.edu

[ Home | Download | User Guide | Publications | Acknowledgement ]