Chapel Tutorial for Programmers

This tutorial is a quick introduction to Chapel for programmers. (It expects you to already be comfortable programming in another language.) This code is currently compatible with the version 1.8.0 Chapel compiler. (Last Updated Nov. 2013.) If you are waiting for it to work for a new version, it is okay to let me know I'm being slow!

Let's get started! First, create a file, test.chpl in your favorite text editor. Then, let's declare our first variable. Add the following line to your code:

var n: int = 10;
var realArray: [0..n-1] real;
realArray[0] = 1.0;
realArray[1] = 2.0;
realArray[2] = 3.0;
realArray[3] = 4.0;
realArray[4] = 5.0;
realArray[5] = 4.0;
realArray[6] = 3.0;
realArray[7] = 2.0;
realArray[8] = 1.0;
realArray[9] = 0.0;

maximum: 0.0
realArray: 1.0 2.0 3.0 4.0 5.0 4.0 3.0 2.0 1.0 0.0

If an error occurs, go back to your code and try to fix it. At this point, not all of the error messages are super helpful. If you get a weird message that doesn't tell you anything, you might need to ask for more help. I recommend checking out the archives of the Chapel mailing list.

proc findMax(array) {
    var arrayMax = array[0];
    for number in array {
        if (number > arrayMax) {
            arrayMax = number;
        }
    }
    return arrayMax;
}

One nice part of Chapel is that you can easily measure the running time of your code using the Time package. Let's do that next! Add this to the top of your code (first line):

var timer: Timer;
timer.start();
writeln("Max of realArray is: ", findMax(realArray));
timer.stop();
writeln("That took ", timer.elapsed(), " seconds.");

var timer: Timer;
timer.start();
maximum = findMax(realArray);
timer.stop();
writeln("Max of realArray is: ", maximum);
writeln("That took ", timer.elapsed(), " seconds.");

Compile and run your code to see how long it takes to find the maximum element. We would like to see how this changes with different values of n. The main problem is that we don't want to have to explicitly initialize each element of realArray in the code. So, next, let's learn how to create arrays of random values. We will use the Random package. Just like before, add a use statement to the top of your code:

var n = 10;
var realArray : [0..n-1] real;
var randomStream = new RandomStream(SeedGenerator.currentTime);
randomStream.fillRandom(realArray);
writeln("realArray: ", realArray);

Compile and run your code to make sure this new code is working. You'll see that the array is now populated by random values. Now go back and change n to be larger:

Compile and run again. Wouldn't it be great to not have to recompile just because you change n? Absolutely! Let's make that happen using configurable constants. Change the initialization of n to the following:

Great! Now let's try harnessing the power of extra processors! Chapel does this by creating new tasks, which are then run on available processors. For example, if we had two processors available to find the maximum value of an array, we could have two tasks: one to search each half of the array. Let's work towards that by first adding a function that looks in a range of the array. We'll use a while loop to iterate here instead of a for loop (for example's sake). Add this function to your code:

proc findMaxInRange(array, lower, upper) {
    var arrayMax = array[lower];
    while (lower <= upper) {
        if (array[lower] > arrayMax) {
            arrayMax = array[lower];
        }
        lower += 1;
    }
    return arrayMax;
}

If you try to use this function right away, Chapel complains about the line where lower is incremented, saying: "error: non-lvalue actual passed to ref argument". This happens because Chapel treats passed integers as constants; you can't modify them. You can fix this by changing the function signature:

Here, in tells Chapel not to treat that variable as a constant. Another option to fix this problem is to create a separate index variable. Instead of using in, I will continue from the following example, using index i, which might make the code easier to read:

proc findMaxInRange(array, lower, upper) {
    var i = lower;
    var arrayMax = array[i];
    while (i <= upper) {
        if (array[i] > arrayMax) {
            arrayMax = array[i];
        }
        i += 1;
    }
    return arrayMax;
}

proc findMax(array) {
    var middle = array.numElements/2;
    var lowerMax, upperMax: real;
    lowerMax = findMaxInRange(array, 0, middle);
    upperMax = findMaxInRange(array, middle+1, array.numElements - 1);
    if (lowerMax > upperMax) { return lowerMax; }
    return upperMax;
}

proc findMax(array) {
    var middle = array.numElements/2;
    var lowerMax, upperMax: real;
    lowerMax = findMaxInRange(array, 0, middle);
    upperMax = findMaxInRange(array, middle+1, array.numElements - 1);
    return max(lowerMax, upperMax);
}

If we have two processors, we can have separate tasks find the values of the two partial-maximums in parallel. That part then may take only half as long! Chapel let's us do this really easily: just wrap them in a cobegin:

proc findMax(array) {
    var middle = array.numElements/2;
    var lowerMax, upperMax: real;
    cobegin ref(lowerMax, upperMax) {
        lowerMax = findMaxInRange(array, 0, middle);
        upperMax = findMaxInRange(array, middle+1, array.numElements - 1);
    }
    return max(lowerMax, upperMax);
}

The cobegin creates a new task for each line (with the hope they will get assigned to their own hardware thread) then waits for all of them to finish before continuing with the rest of the code. The ref(lowerMax, upperMax) tells Chapel it can modify those variables that were declared outside the cobegin. (Without this statement, it treats most of those "outside" variables as constants.) Awesome! Compile and run your code again.

Challenge: write fastFindMaxInRange, a recursive version of findMaxInRange. Important: Chapel doesn't automatically determine the return type of recursive functions, so your signature needs to declare this:

Try to get this to work before continuing. Hint: try getting it to work serially before adding a cobegin. If you get stuck, my solution is below.

proc fastFindMaxInRange(array, lower, upper) : real {
    if (lower == upper) { return array[lower]; }
    var lowerMax, upperMax: real;
    var middle = ((upper - lower) / 2) + lower;
    cobegin ref(lowerMax, upperMax) {
        lowerMax = fastFindMaxInRange(array, lower, middle);
        upperMax = fastFindMaxInRange(array, middle + 1, upper);
    }
    return max(lowerMax, upperMax);
}

proc serialFindMax(array) {
    return findMaxInRange(array, 0, array.numElements - 1);
}

proc parallelFindMax(array) {
    return fastFindMaxInRange(array, 0, array.numElements - 1);
}

timer.clear();
timer.start();
var serialMax = serialFindMax(realArray);
timer.stop();
var serialSeconds = timer.elapsed();
timer.clear();
timer.start();
var parallelMax = parallelFindMax(realArray);
timer.stop();
var parallelSeconds = timer.elapsed();
timer.clear();
writeln("SERIAL test: found the max to be ", serialMax, " in ", serialSeconds, " seconds.");
writeln("PARALLEL test: found the max to be ", parallelMax, " in ", parallelSeconds, " seconds.");
var speedup = serialSeconds / parallelSeconds;
if (speedup > 1) {
    writeln("    The parallel trial achieved a ", speedup, "x speedup over the serial!");
} else {
    writeln("    No speedup attained!");
}

$ ./a.out --n=5000
...
Serial test: found the max to be 0.999846 in 0.0008 seconds.
Parallel test: found the max to be 0.999846 in 0.55384 seconds.
Achieved a 0.00144446x speedup!

Yuck! One problem here is that there is too much overhead creating a bunch of different tasks that aren't able to run simultaneously. It's not advantageous to create too many tasks. (If you have more processors, you're probably getting better results.) To fix this problem, let's cap the number of tasks we create. The idea is that we'll specify the number of tasks we are allowed to use. To do this, create a new function, hybridFindMaxInRange, similar to fastFindMaxInRange, except with an extra parameter, numTasks, which specifies the maximum number of tasks to use in this function call. Try to write this yourself; my solution is below if you get stuck.

proc hybridFindMaxInRange(array, lower, upper, numTasks) : real {
    if (lower == upper) { return array[lower]; }
    if (numTasks == 1) {
        return findMaxInRange(array, lower, upper);
    }
    var lowerMax, upperMax: real;
    var middle = ((upper - lower) / 2) + lower;
    var lowerTasks = numTasks / 2;
    var upperTasks = numTasks - lowerTasks;
    cobegin ref(lowerMax, upperMax) {
        lowerMax = hybridFindMaxInRange(array, lower, middle, lowerTasks);
        upperMax = hybridFindMaxInRange(array, middle+1, upper, upperTasks);
    }
    return max(lowerMax, upperMax);
}

proc hybridFindMax(array, numTasks) {
    return hybridFindMaxInRange(array, 0, array.numElements-1, numTasks);
}

After this, add some code to test this just like the parallel version, including reporting the speedup over the serial. I added another configurable constant for the number of threads I wanted to use. I actually ran a number of tests, slowly increasing n. The parallel test started taking too much time, so I had to comment it out. Here are the results of one of my tests:
$ ./a.out --n=100000000 --threads=8
...
SERIAL test: found the max to be 1.0 in 6.5324 seconds.
HYBRID test: found the max to be 1.0 in 0.816864 seconds.
The hybrid achieved a 7.99692x speedup over the serial trial!

That's much better! This gave us some actual speedup! We've done a lot here. There is, however, one way to do this entire thing in just one line of Chapel code. This uses a parallel algorithm style called a reduction, which is a single pairwise operation applied to an array using a parallel divide-and-conquer as appropriate. Here's how to write that single line:

Acknowledgements: Thanks to Ernie Heyder for doing ALL the proofreading for early versions of this. Thanks to Brad Chamberlain both for pointing out errors and all his work on the Chapel project. Thanks to David Bunde for his own excellent tutorial and for working with me on our workshops at SC.