Parallel Computing

====Resources====
TIGRESS is a collection of computational resources for Princeton Researchers. Here are the descriptions for some clusters:

  • Della: Dell Linux cluster; 2.67 GHz Westmere; 4-8GB/Core; 1536 Cores; 137TB total disk.
  • Hecate: SGI UV 1000 cluster; 2.67 GHz Westmere; 5.3-10.7GB/Core; 1536 Cores; 180TB total disk.
  • Adroit: Dell Linux cluster; 2.33 GHz Xeon; 2GB/Core; 64 Cores; 1TB total disk.

Other university clusters are Orbital and Tiger. For more details, please visit http://www.princeton.edu/researchcomputing/about/tigress/

The Department of Politics has it own cluster:

  • Tukey (Department of Politics): Linux clusters; 2.67 GHz Intel Xeon 5550; 3GB/Core; 384 Cores; 3TB total disk.

====Quick Installation of Packages and Examples====
1. Log in to a cluster, such as Adroit. If you don't have an account yet, please register one at http://www.princeton.edu/researchcomputing/about/tigress/

2. In your home directory, run the following to create a directory called tigress-scripts:
git clone https://github.com/olmjo/tigress-scripts.git

3. Start R at the Linux/Unix prompt, and install the "foreach" package.

[userxyz@adroit3 ~]$ R
R version 3.1.1 (2014-07-10) -- "Sock it to Me"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
> install.packages("foreach")
> q()

4. At the Linux/Unix prompt, navigate to the tigress-scripts/setup subdirectory, and run ./setup.sh (answer all 'yes' to questions)

[userxyz@adroit3 ~]$ cd
[userxyz@adroit3 ~]$ cd tigress-scripts/setup/
[userxyz@adroit3 setup]$ ./setup.sh

5. There are examples in the tigress-scripts/examples subdirectory. Please use 'sbatch' to submit parallel jobs to the clusters, as 'qsub' is no longer supported at the university clusters.

If the above steps failed to install doMPI successfully, it is most likely that Rmpi and the openmpi environment were not set or installed or loaded correctly. The Linux command 'module' may be used to find out available modules of openmpi, and load the correct one accordingly. Once the correct openmpi environment is loaded, the doMPI package may be installed manually in R by 'install.packages("doMPI")'. However, this will need some knowledge on Linux/Unix. If you can not get it done, please contact Computational Science and Engineering Support (cses@princeton.edu) for help.

6. To install Rmpi package on Tukey cluster within the Politics department, you may follow the following steps

(a) Go to http://cran.r-project.org/web/packages/Rmpi/index.html to download the Rmpi package, i.e. Rmpi_0.6-6.tar.gz

(b) FTP Rmpi_0.6-6.tar.gz to your Tukey home directory, and put it under the directory ~/download (if you don't have such a directory, you may create one).

(c) Log on Tukey, cut-and-paste the following 4 lines:

----
echo "module load openmpi/gcc" >> ~/.bashrc

module load openmpi/gcc

export MPI_ROOT=`echo $LD_LIBRARY_PATH | perl -n -e 's/.*?([\/a-zA-Z]*?\/openmpi.*?)(\/lib64):.*/\$1/g; print'`

R CMD INSTALL -l $R_LIBS ~/download/Rmpi_0.6-6.tar.gz --configure-args="--with-Rmpi-type=OPENMPI --with-Rmpi-include='$MPI_ROOT/include' --with-Rmpi-libpath='$MPI_ROOT/lib64' "

or if you don't have $R_LIBS defined,

R CMD INSTALL ~/download/Rmpi_0.6-6.tar.gz --configure-args="--with-Rmpi-type=OPENMPI --with-Rmpi-include='$MPI_ROOT/include' --with-Rmpi-libpath='$MPI_ROOT/lib64' "
----

(d) Start R, and install.packages("doMPI"), the same way you install an R package.

Note: $R_LIBS is the directory where you have your R packages installed, such as /home/xxxxxx/R/x86_64-redhat-linux-gnu-library/3.1, where xxxxxx is your Princeton ID. This is based on the default settings of an account newly created on Tukey. If you have changed some env variables, the above command lines need to be modified accordingly.

====Parallel Solutions in R====
There are several ways to improve efficiency by parallelizing computation:

  • doParallel package - parallelization over cores within a computer.
  • doMPI package - MPI parallelization over a group of computers.
  • OpenMP in the Rcpp package - an API for one of several C++ parallelization techniques, with a limitation of using a single machine (i.e. thread parallelization over cores only).

Note:

(1) Under parallel computing environment, random number generation may not follow the same sequence across different computers/cores. The package doRNG provides a solution for reproducible parallel computing results in R.

(2) There is no guaranteed order in which the parallel computing part are processed.

(3) Because of its special architectural design which treats the whole cluster as a single node computer, it is recommended to use OpenMP on '''Hecate''' (instead of MPI). For the Tukey, Della and Adroit, MPI will reach out to more computing resources.


====== doParallel ======

Sample steps of using the package:

1. Load the packages. If not available, they need to be installed.

library(foreach)
library(doParallel)
library(doRNG)

2. Register a parallel backend. For example, we want to use 4 threads/cores in parallel computing, and a reproducible stream of parallel random numbers:

cl <- makeCluster(spec = 4, type = "PSOCK")
registerDoParallel(cl)
registerDoRNG()
set.seed(1)

3. Use %dopar% in the foreach command. For example,

outA <- foreach(it = 1:100, .combine = "+") %dopar% {
return(rnorm(1))
}

4. Explicitly .export objects needed by the iterations.

dfOutNPBS <- foreach(it = 1:nMC,
.export = c("dfProfile", "turnout", "f1"),
.combine = rbind) %dopar% {
dfFits <- dfProfile
vIdx <- sample(1:nrow(turnout),
size = nrow(turnout),
replace = TRUE
)
dfBS <- turnout[vIdx, ]
mBS <- glm(f1, family = binomial, data = dfBS)
probs <- predict(mBS, newdata = dfProfile, type = "response")
dfFits$phat <- probs
dfFits$mc <- it
return(dfFits)
}

5. Close the parallel backend when things are done. For example,

stopCluster(cl)


======doMPI======

Sample steps of using the package:

1.Load the packages. If not available, they need to be installed.

library(foreach)
library(doMPI)
library(doRNG)

2. Register a parallel MPI backend. In the example, we also want a reproducible stream of parallel random number:

cl <- startMPIcluster()
registerDoMPI(cl)
registerDoRNG(1)

3. The foreach and %dopar% parallel structure is similar to the above, with .export explicitly specifies global/external objects being used in the loop:

output <- foreach(i = (1:nBS),
.combine = rbind,
.export = c("law82", "sizeBS")
) %dopar% {
indsBS <- sample(x = 1:nrow(law82),
size = sizeBS,
replace = FALSE
)
subBS <- law82[indsBS, ]
corBS <- cor(subBS$LSAT, subBS$GPA)
return(corBS)
}

4. Close the backend when things are done.

closeCluster(cl)
mpi.quit()


======OpenMP======
OpenMP is an API for one of several C++ parallelization techniques.

  • Restricted to a single machine, like our use of a socket cluster at the R level.
  • Unlike the socket cluster, though, it will not run everywhere.
  • Requires special compile-time instructions.
  • Users annotate the section of their code that should be parallelized. For example, we can specify the number of threads to be 4 and the parallelization is on the for loop:

omp_set_num_threads(4) ;
# pragma omp parallel for
for (int i = 0 ; i < 10 ; i++) {
......
}

Sample steps of using the package:

1. Load the Rcpp package, and set environmental variables so that the C++ compiler knows that OpenMP will be used.

library(Rcpp)
Sys.setenv("PKG_CXXFLAGS" = "-fopenmp")
Sys.setenv("PKG_LIBS" = "-fopenmp")

2. Here is a sample Rcpp code:

void omp2 (int t = 1) {
omp_set_num_threads(t) ;
# pragma omp parallel for
for (int i = 0 ; i < 10 ; i++) {
Rcout << " " << i << " " ;
}
Rcout << std::endl ;
}

3. Save the above code in omp_functions.cpp, and compile the Rcpp function in R. Thus, omp2 becomes a R function available to use.

> sourceCpp("omp_functions.cpp")
> omp2(4)
0 1 2 6 7 3 4 5 8 9

Note: There is no guaranteed order in which iterations are processed.

====Submit Parallel Jobs to the Clusters====
The Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. The clusters are managed under SLURM, and all parallel jobs should be submitted to the clusters by 'sbatch' only, since 'qsub' is no longer supported. sbatch submits a batch script to SLURM. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. Here is an example of a sbatch script for parallel computing using MPI (tigress-scripts/examples/ex6, once https://github.com/olmjo/tigress-scripts.git is cloned as mentioned above):

#!/usr/bin/env bash
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH -t 10:00
#SBATCH -J Ex6
#SBATCH -o log.%j
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
srun Rscript ./ex6.R

For more information on SLURM, please visit https://computing.llnl.gov/linux/slurm/quickstart.html

For more information on sbatch, please visit https://computing.llnl.gov/linux/slurm/sbatch.html