Posts tagged ‘Parallel Computing’

February 21, 2012

Parallel computing with package ‘snowfall’

Lately I have been looking for ways to decrease the amount of time it takes me to run multiple regressions over a very large data set. There are several options that I am investigating to do this, and certainly more that I don’t know of yet.

  • Code more efficiently.
  • Compute several operations in parallel over a two or more CPU cores.
  • Tap into a network of computers, and further expand the number of CPU cores to parallelize calculations.

Because many of my computer jobs are “embarassingly parallel”, the options mentioned above would immediately improve the speed I can compute (and re-compute) jobs. This post will go through an example using the CRAN package snowfall to parallelize a computation over several CPU cores on the same computer (bullet #2 above).

The CRAN package snowfall is built to make it easy to create parallel processes. I recommend taking a look at the associated vignette and tutorial.

Before beginning to use snowfall, do the following:

  1. Upgrade to the latest version of R – as of this post version 2.14.1 (or the patched version of R-2.13.0 – available here). FYI – There is a bug in version 2.13.0 (for MS Windows 7) that prevents snowfall from operating smoothly.
  2. Install the latest version of the package snowfall ( install.packages('snowfall', dependencies = TRUE) )
  3. Find out how many cores you have on the CPU of the machine you will be using.  In my example below, I am using a machine with 8 CPU cores and running Windows 7.
  4. Convert any ‘for’ loops into a function that you can call using apply(). See my previous post that outlines this process.

Using snowfall: A simple example

The reason I put together this post is because I couldn’t easily find a ‘plug’n play’ code example in the existing online literature to execute the type of parallelization I wanted. Out of necessity I worked through the wrinkles and am now successfully utilizing multiple CPU cores in R.  –  Note: By default, R uses only one CPU core unless you explicitly code it to use multiple cores (as in this example).

read more »