Previous Up Next

9  Library of CIL Modules

We are developing a suite of modules that use CIL for program analyses and transformations that we have found useful. You can use these modules directly on your code, or generally as inspiration for writing similar modules. A particularly big and complex application written on top of CIL is CCured (../ccured/index.html).

9.1  Data flow analysis framework

The module dataflow.ml contains a parameterized framework for forward and backward data flow analyses. You provide the transfer functions and this module does the analysis.

9.2  Dominators

The module dominators.ml contains the computation of immediate dominators. It uses the dataflow.ml module.

9.3  Points-to Analysis

The module ptranal.ml contains a context-sensitive, interprocedural points-to analysis for CIL. The analysis has the following characteristics:
  1. Not based on C types (inferred pointer relationships are sound despite most kinds of C casts)
  2. One level of subtyping
  3. One level of context sensitivity
  4. Monomorphic type structures
  5. Field insensitive (fields of structs are conflated)
  6. Demand-driven (points-to queries are solved on demand)
  7. Handles function pointers
The analysis itself is factored into two components: ptranal.ml, which walks over the CIL file and generates constraints, and golf.ml, which solves the constraints. The analysis is invoked with the function Ptranal.analyze_file: Cil.file -> unit. This function builds the points-to graph for the CIL file and stores it internally. There is currently no facility for clearing internal state, so Ptranal.analyze_file should only be called once.

The constructed points-to graph supports several kinds of queries, including alias queries (may two expressions be aliased?) and points-to queries (to what set of locations may an expression point?).

The main interface with the alias analysis is as follows:
  1. Ptranal.may_alias : Cil.exp -> Cil.exp -> bool. If true, the two expressions may have the same value.
  2. Ptranal.resolve_funptr : Cil.exp -> (Cil.fundec list). Returns the list of functions to which the given expression may point.
The precision of the analysis can be customized by changing the values of several flags:
  1. Ptranal.no_sub : bool ref. If true, subtyping is disabled.
  2. Ptranal.analyze_mono : bool ref. If true, context sensitivity is disabled and the analysis is effectively monomorphic.
  3. Ptranal.smart_aliases : bool ref. If true, ``smart'' disambiguation of aliases is enabled. Otherwise, aliases are computed by intersecting points-to sets (this is an experimental feature).
In practice, the best precision/efficiency tradeoff is achieved by setting Ptranal.no_sub to false, Ptranal.analyze_mono to true, and Ptranal.smart_aliases to false. These are the default values of the flags.

There are also a few flags that can be used to inspect or serialize the results of the analysis:
  1. Ptranal.print_constraints : bool ref. If true, the analysis will print each constraint as it is generated.
  2. Ptranal.print_types : bool ref. If true, the analysis will print the inferred type of each variable in the program. If Ptranal.analyze_mono and Ptranal.no_sub are both true, this output is sufficient to reconstruct the points-to graph. One nice feature is that there is a pretty printer for recursive types, so the print routine doesn't loop.
  3. Ptranal.compute_results : bool ref. If true, the analysis will print out the points-to set of each variable in the program. This will essentially serialize the points-to graph.

9.4  StackGuard

The module heapify.ml contains a transformation similar to the one described in ``StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks'', Proceedings of the 7th USENIX Security Conference. In essence it modifies the program to maintain a separate stack for return addresses. Even if a buffer overrun attack occurs the actual correct return address will be taken from the special stack.

Although it does work, this CIL module is provided mainly as an example of how to perform a simple source-to-source program analysis and transformation. As an optimization only functions that contain a dangerous local array make use of the special return address stack.

For a concrete example, you can see how cilly --dostackGuard transforms the following dangerous code:

  int dangerous() {
    char array[10];
    scanf("%s",array); // possible buffer overrun!
  }

  int main () {
    return dangerous();
  }
See the CIL output for this code fragment

9.5  Heapify

The module heapify.ml also contains a transformation that moves all dangerous local arrays to the heap. This also prevents a number of buffer overruns.

For a concrete example, you can see how cilly --doheapify transforms the following dangerous code:

  int dangerous() {
    char array[10];
    scanf("%s",array); // possible buffer overrun!
  }

  int main () {
    return dangerous();
  }
See the CIL output for this code fragment

9.6  One Return

The module oneret.ml contains a transformation the ensures that all function bodies have at most one return statement. This simplifies a number of analyses by providing a canonical exit-point.

For a concrete example, you can see how cilly --dooneRet transforms the following code:

  int foo (int predicate) {
    if (predicate <= 0) {
      return 1;
    } else {
      if (predicate > 5)
        return 2;
      return 3;
    }
  }
See the CIL output for this code fragment

9.7  Control-Flow Graphs

CIL can reduce high-level C control-flow constructs like switch and continue to lower-level gotos. This completely eliminates some possible classes of statements from the program and may make the result easier to analyze (e.g., it simplifies data-flow analysis).

For a concrete example, you can see how cilly --domakeCFG transforms the following code (note the fall-through in case 1):

  int foo (int predicate) {
    int x = 0;
    switch (predicate) {
      case 0: return 111;
      case 1: x = x + 1;
      case 2: return (x+3);
      case 3: break;
      default: return 222;
    }
    return 333;
  }
See the CIL output for this code fragment

9.8  Partial Evaluation and Constant Folding

The partial.ml module provides a simple interprocedural partial evaluation and constant folding data-flow analysis and transformation. This transformation requires the --domakeCFG option.

For a concrete example, you can see how cilly --domakeCFG --dopartial transforms the following code (note the eliminated if branch and the partial optimization of foo):

  int foo(int x, int y) {
    int unknown;
    if (unknown)
      return y+2;     
    return x+3;      
  }

  int main () {
    int a,b,c;
    a = foo(5,7) + foo(6,7);
    b = 4;
    c = b * b;      
    if (b > c)     
      return b-c;
    else
      return b+c; 
  }
See the CIL output for this code fragment

9.9  Simple Memory Operations

The simplemem.ml module allows CIL lvalues that contain memory accesses to be even futher simplified via the introduction of well-typed temporaries. After this transformation all lvalues involve at most one memory reference.

For a concrete example, you can see how cilly --dosimpleMem transforms the following code:

  int main () {
    int ***three;
    int **two;
    ***three = **two; 
  } 
See the CIL output for this code fragment

9.10  Simple Three-Address Code

The simplify.ml module further reduces the complexity of program expressions and gives you a form of three-address code. After this transformation all expressions will adhere to the following grammar:
 basic::=
    Const _ 
    Addrof(Var v, NoOffset)
    StartOf(Var v, NoOffset)
    Lval(Var v, off), where v is a variable whose address is not taken
                      and off contains only "basic"

 exp::=
    basic
    Lval(Mem basic, NoOffset)
    BinOp(bop, basic, basic)
    UnOp(uop, basic)
    CastE(t, basic)
   
 lval ::= 
    Mem basic, NoOffset
    Var v, off, where v is a variable whose address is not taken and off
                contains only "basic"
In addition, all sizeof and alignof forms are turned into constants. Accesses to arrays and variables whose address is taken are turned into "Mem" accesses. All field and index computations are turned into address arithmetic.

For a concrete example, you can see how cilly --dosimplify transforms the following code:

  int main() {
    struct mystruct {
      int a;
      int b;
    } m;
    int local;
    int arr[3];
    int *ptr;

    ptr = &local;
    m.a = local + sizeof(m) + arr[2];
    return m.a; 
  } 
See the CIL output for this code fragment


Previous Up Next