wiki:Tutorial/Core/Optimisation

Guided Optimisation

In this section we'll use ddc to perform programmer guided optimisation of a source module. "Programmer Guided" means that we'll specify exactly what code transforms should be applied to the program, instead of relying on a general purpose optimisation script like that invoked by the '-O' compiler flag.

Get the demo source file

First download the demo module that we'll optimise, and save it to some directory.

The demo file: Main.dcl

The program performs some simple list-processing.

Type Checking

Before optimising the program, we will type check it to make sure it's ok. The -check option does this.

$ ddc -check Main.dcl

$ echo $?
0

If the file is type correct then we get no response, and the unix error code will be set to 0.

Now add an error (I'll trust you to think of one) and verify that ddc -check finds the problem.

$ ddc -check Main.dcl
When checking expression.
  Core type mismatch in application.
               type: * ~> ! ~> $ ~> * ~> *
     does not match: * ~> *
     in application: (->) (List r2)

$ echo $?
1

Loading a module

The -load command also type checks a module, but it prints the module source to the console as well.

$ ddc -load Main.dcl
module Main 
...

When performing optimisations, it is helpful to redirect the output to a new file. That way you can have the original source and optimised version loaded in different windows of your editor.

$ ddc -load Main.dcl > Main-opt.dcl

Open up both Main.dcl and this new Main-opt.dcl module in your editor and verify they are similar. Note that they won't be textually identical because the DDC parser also does some basic desugaring, so the sugar is not preserved by -load. The Main-opt.dcl module will also contain braces and semicolons, instead of relying on indentation and the offside rule to determine the block structure of the code.

Inlining

To make this code run faster, the first thing we should do is to inline the definitions of numeric functions like addInt and subInt. Doing this will reveal the primitive unboxed operators and avoid unboxing and boxing their arguments and result value.

To perform inlining, we need to know what file the definitions of the functions to be inlined are. To determine this, ask ddc where its base libraries are installed with the -print-basedir option.

$ ddc -print-basedir
/Users/USER/.cabal/share/ddc-code-0.3.0.0

The exact path will be different depending on what operating system you're on, and how you've installed DDC. The Nat module that contains the definitions of addInt and friends is under the base directory. You can print the source of this module like so:

$ cat `ddc -print-basedir`/lite/base/Data/Numeric/Nat.dcl

To inline the definitions from Nat.dcl into Main.dcl we use the -with option to tell ddc where the source file is, and then tell it what functions to inline with -trans.

$ ddc -load Main.dcl \
      -with `ddc -print-basedir`/lite/base/Data/Numeric/Nat.dcl \
      -trans "Inline Nat[addNat, subNat]" \
      > Main-opt.dcl

The part after -trans is called the "transform specification" and tells DDC how to transform the module. On a unix command line it needs to be wrapped in quotes (") to escape the punctuation and space characters.

Check the resulting Main-opt.dcl file to confirm the required functions have been inlined. For example the definition of replicate should now look like this:

replicate : [r1 r2 : %].[a : *].Nat r1 -(!0 | Use r1 + Use r2)> a -(Read r1 + Alloc r2 | Use r1 + Use r2)> List r2 a
    = /\(r1 r2 : %)./\(a : *).
       \(n : Nat r1).\(x : a).
      letregion r3 in
      case n of {
        N# (n2 : Nat#) 
         -> case eq# [Nat#] n2 0# of {
              True#  
               -> Nil [r2] [a] ();
              False#  
               -> Cons [r2] [a] x
                      (replicate [r3] [r2] [a]
                           ((/\(^ ^ ^ : %).
                              \(^ : Nat ^2).\(^ : Nat ^1).
                             case ^1 of {
                               N# (^ : Nat#) 
                                -> case ^1 of {
                                     N# (^ : Nat#) 
                                      -> N# [^0] (sub# [Nat#] ^1 ^0)
                                   }
                             }) [r1] [r3] [r3] n (N# [r3] 1#)) x)
            }
      };

The DDC Inline transform simply substitutes the definition of a function for its name. If we want to eliminate the nested lambda abstractions then we'll need to apply some more transforms.

Substitution

The next step is to perform perform substitution to beta-reduce the application of the inlined subInt function to its arguments. This is done with the Beta transform. We'll add Beta as a follow-on transform after Inline.

ddc -load Main.dcl \
      -with `ddc -print-basedir`/lite/base/Data/Numeric/Nat.dcl \
      -trans "Inline Nat[addNat, subNat]; Beta" \
      > Main-opt.dcl

This yields:

replicate : [r1 r2 : %].[a : *].Nat r1 -(!0 | Use r1 + Use r2)> a -(Read r1 + Alloc r2 | Use r1 + Use r2)> List r2 a
    = /\(r1 r2 : %)./\(a : *).
       \(n : Nat r1).\(x : a).
      letregion r3 in
      case n of {
        N# (n2 : Nat#) 
         -> case eq# [Nat#] n2 0# of {
              True#  
               -> Nil [r2] [a] ();
              False#  
               -> Cons [r2] [a] x
                      (replicate [r3] [r2] [a]
                           ((weakclo {[r1]} in
                             /\(^ ^ : %).
                              \(^ : Nat r1).\(^ : Nat ^1).
                             case ^1 of {
                               N# (^ : Nat#) 
                                -> case ^1 of {
                                     N# (^ : Nat#) 
                                      -> N# [^0] (sub# [Nat#] ^1 ^0)
                                   }
                             }) [r3] [r3] n (N# [r3] 1#)) x)
            }
      };

The [r1] argument is gone, but now we've got a weakclo {[r1]} wrapping the remaining abstractions. The Beta transform adds closure weakenings to ensure that the closure of the expression is preserved by substitution. In this instance, the inner lambda abstractions do not have Use r1 in their closure because r1 is not used as a type argument (like [r1]) in the resulting expression.

Performing further Beta transforms won't do anything, because the weakclo is in the way (try it), so we'll need to get rid of it before proceeding.

Bubbling Casts

We can move the weakclo (and other casts) outwards by applying the Bubble transform:

ddc -load Main.dcl \
      -with `ddc -print-basedir`/lite/base/Data/Numeric/Nat.dcl \
      -trans "Inline Nat[addNat, subNat]; Beta; Bubble" \
      > Main-opt.dcl

This yields:

replicate : [r1 r2 : %].[a : *].Nat r1 -(!0 | Use r1 + Use r2)> a -(Read r1 + Alloc r2 | Use r1 + Use r2)> List r2 a
    = /\(r1 r2 : %)./\(a : *).
       \(n : Nat r1).\(x : a).
      weakclo {[r1]} in
      letregion r3 in
      case n of {
        N# (n2 : Nat#) 
         -> case eq# [Nat#] n2 0# of {
              True#  
               -> Nil [r2] [a] ();
              False#  
               -> Cons [r2] [a] x
                      (replicate [r3] [r2] [a]
                           ((/\(^ ^ : %).
                              \(^ : Nat r1).\(^ : Nat ^1).
                             case ^1 of {
                               N# (^ : Nat#) 
                                -> case ^1 of {
                                     N# (^ : Nat#) 
                                      -> N# [^0] (sub# [Nat#] ^1 ^0)
                                   }
                             }) [r3] [r3] n (N# [r3] 1#)) x)
            }
      };

The weakclo has floated out to just inside the top-level lambda abstractions. The Bubble transform can't float it further because then the closure of the inner most function would change.

We still need to substitute for the remaining arguments. Add more Beta; Bubble and see how you go.

The Fixpoint Transform

Instead of typing a transform specification multiple times, we can use the fixpoint operator in the transform spec. The transform fix N TRANS will apply TRANS to the module until it stops making progress, or stop after N iterations, whichever happens first.

ddc -load Main.dcl \
      -with `ddc -print-basedir`/lite/base/Data/Numeric/Nat.dcl \
      -trans "Inline Nat[addNat, subNat]; fix 10 {Beta; Bubble} " \
      > Main-opt.dcl

Note that we've wrapped the inner transform in {} parenthesis because we want to apply both Beta and Bubble for each iteration. Applying the above transform yields:

replicate : [r1 r2 : %].[a : *].Nat r1 -(!0 | Use r1 + Use r2)> a -(Read r1 + Alloc r2 | Use r1 + Use r2)> List r2 a
    = /\(r1 r2 : %)./\(a : *).
       \(n : Nat r1).\(x : a).
      weakclo {[r1]; n} in
      letregion r3 in
      weakclo {[r3]} in
      case n of {
        N# (n2 : Nat#) 
         -> case eq# [Nat#] n2 0# of {
              True#  
               -> Nil [r2] [a] ();
              False#  
               -> Cons [r2] [a] x
                      (replicate [r3] [r2] [a]
                           ((\(^ : Nat r3).
                             case n of {
                               N# (^ : Nat#) 
                                -> case ^1 of {
                                     N# (^ : Nat#) 
                                      -> N# [r3] (sub# [Nat#] ^1 ^0)
                                   }
                             }) (N# [r3] 1#)) x)
            }
      };

Sadly, the last argument (N# [r3] 1#) has not been substituted. In general, we should perform beta reduction when the argument is a redex.

Snipping and Flattening

To complete the reduction for replicate we should introduce a new let-binding for the (N# [r3] 1#) argument. This will give us a variable name for it that we can substitute in place of the redex. The Snip transform introduces let-bindings for nested applications, and the Flatten transform tidies up the nested let-expressions that are created by applying Snip.

ddc -load Main.dcl \
      -with `ddc -print-basedir`/lite/base/Data/Numeric/Nat.dcl \
      -trans "Inline Nat[addNat, subNat]; fix 10 {Beta; Bubble}; Snip; Flatten" \
      > Main-opt.dcl

This yields:

replicate : [r1 r2 : %].[a : *].Nat r1 -(!0 | Use r1 + Use r2)> a -(Read r1 + Alloc r2 | Use r1 + Use r2)> List r2 a
    = /\(r1 r2 : %)./\(a : *).
       \(n : Nat r1).\(x : a).
      weakclo {[r1]} in
      letregion r3 in
      case n of {
        N# (n2 : Nat#) 
         -> let ^ : Bool# = eq# [Nat#] n2 0# in
            case ^0 of {
              True#  
               -> Nil [r2] [a] ();
              False#  
               -> let ^ : [^ ^ : %].Nat r1 -(!0 | Use ^0)> Nat ^1 -(Read ^1 + Read r1 + Alloc ^0 | Use ^0 + Use r1)> Nat ^0
                        = /\(^ ^ : %).
                           \(^ : Nat r1).\(^ : Nat ^1).
                          case ^1 of {
                            N# (^ : Nat#) 
                             -> case ^1 of {
                                  N# (^ : Nat#) 
                                   -> let ^ : Nat# = sub# [Nat#] ^1 ^0 in
                                      N# [^0] ^0
                                }
                          } in
                  let ^ : Nat r3 = N# [r3] 1# in
                  let ^ : Nat r3 = ^1 [r3] [r3] n ^0 in
                  let ^ : List r2 a = replicate [r3] [r2] [a] ^0 x in
                  Cons [r2] [a] x ^0
            }
      };

Now, although a new (anonymous) let-binding has been introduced for (N# [r3] 1#), our inner lambda abstraction is no longer directly applied to its argument, so Beta still doesn't apply. We need to shift the inner lambda back into its use site.

Moving bindings Forward

The Forward transform moves pure bindings forward into their use sites. At the time of writing, it only works when the binding has a proper variable name instead of an anonymous binder, so we need to use Namify as well.

ddc -load Main.dcl \
      -with `ddc -print-basedir`/lite/base/Data/Numeric/Nat.dcl \
      -trans "Inline Nat[addNat, subNat]; fix 10 {Beta; Bubble}; Snip; Flatten; Namify; Forward" \
      > Main-opt.dcl

This yields:

replicate : [r1 r2 : %].[a : *].Nat r1 -(!0 | Use r1 + Use r2)> a -(Read r1 + Alloc r2 | Use r1 + Use r2)> List r2 a
    = /\(r1 r2 : %)./\(a : *).
       \(n : Nat r1).\(x : a).
      weakclo {[r1]; n} in
      letregion r3 in
      weakclo {[r3]} in
      case n of {
        N# (n2 : Nat#) 
         -> let x10 : Bool# = eq# [Nat#] n2 0# in
            case x10 of {
              True#  
               -> Nil [r2] [a] ();
              False#  
               -> let x16 : Nat r3 = N# [r3] 1# in
                  let x17 : Nat r3
                        = (\(x11 : Nat r3).
                           case n of {
                             N# (x12 : Nat#) 
                              -> case x11 of {
                                   N# (x13 : Nat#) 
                                    -> let x14 : Nat# = sub# [Nat#] x12 x13 in
                                       N# [r3] x14
                                 }
                           }) x16 in
                  let x18 : List r2 a = replicate [r3] [r2] [a] x17 x in
                  Cons [r2] [a] x x18
            }
      };

This looks promising. We finally have the inner abstraction applied to its argument.

The Final Result

To complete the reduction we just need to apply the Snip; Flatten; Namify; Forward part of the transform during the fixpoint.

ddc -load Main.dcl \
      -with `ddc -print-basedir`/lite/base/Data/Numeric/Nat.dcl \
      -trans "Inline Nat[addNat, subNat]; fix 10 {Beta; Bubble; Snip; Flatten; Namify; Forward}" \
      > Main-opt.dcl

Here we go:

replicate : [r1 r2 : %].[a : *].Nat r1 -(!0 | Use r1 + Use r2)> a -(Read r1 + Alloc r2 | Use r1 + Use r2)> List r2 a
    = /\(r1 r2 : %)./\(a : *).
       \(n : Nat r1).\(x : a).
      weakclo {[r1]} in
      letregion r3 in
      weakclo {[r3]} in
      case n of {
        N# (n2 : Nat#) 
         -> let x11 : Bool# = eq# [Nat#] n2 0# in
            case x11 of {
              True#  
               -> Nil [r2] [a] ();
              False#  
               -> let x20 : Nat r3 = N# [r3] 1# in
                  case n of {
                    N# (x46 : Nat#) 
                     -> case x20 of {
                          N# (x47 : Nat#) 
                           -> let x48 : Nat# = sub# [Nat#] x46 x47 in
                              let x21 : Nat r3 = N# [r3] x48 in
                              let x49 : List r2 a = replicate [r3] [r2] [a] x21 x in
                              Cons [r2] [a] x x49
                        }
                  }
            }
      };

This is the best we can do with the current version of DDC (0.3.0). Ideally we'd like to float x11 and x20 into their use sites and eliminate the case expressions that destruct these bindings. It would also help to perform the worker wrapper transform so that the loop counter n doesn't need to be unboxed and reboxed for every recursive call.

Compiling the optimised result

As the final step, we should compile and run the optimised program to ensure DDC hasn't broken anything. Each of the transforms we have applied is (supposed to be) correctness preserving, so the optimised program should give the same result as the unoptimised one.

$ ddc -make Main-opt.dcl
$ ./Main-opt
20
19
18
17
16
15
14
13
12
11
10
100
100
100
100
100

Looks good!

Note that it's currently not possible to compile every intermediate version of the above code to an executable. This is because the DDC 0.3.0 code generator does not yet support nested functions. This will be in the next version.

Last modified 6 months ago Last modified on Dec 21, 2012 7:32:41 AM