Learning Objectives

Running R in Batch

Executable R Scripts

LSF

Preparations

  • You need faculty sponsorship to get access to Zorro. Once you have a sponsorship, fill out a request form from the Zorro Website.

  • After you have permission to access Zorro, you interface with it using SSH.

    • Windows users should download and install PuTTY to use SSH.
    • Mac users already have SSH installed in bash.
  • You transfer files between the supercomputer and your computer through FTP.

    • Windows users should download and install WinSCP to do use FTP. I sometimes also use FileZilla
    • Mac users can already use FTP through the terminal. But using FileZilla might be a little easier.
  • You are required to use the AU VPN while accessing Zorro. All users should follow the AU instructions here to connect to the AU VPN.

Connecting to Zorro

  • Connect to the AU VPN.

  • Windows Users:

    1. Open PuTTY
    2. Under Host Name, type zorro.american.edu to connect to the server. Press Enter
    3. A black screen will appear where you need to enter your AU username and password.
  • Mac Users:

    1. In the terminal, type:

      ssh -l username zorro.american.edu

      where “username” is your AU username.

    2. Type in your AU password and hit enter.

  • Once you are connected to Zorro, you use bash to navigate, run programs, submit jobs, etc..

LSF Files

  • You format the properties of a job in terms of a LSF file.

  • Example LSF File:

    #BSUB -J minimal_example
    #BSUB -q normal
    #BSUB -o minimal_out.txt
    #BSUB -e minimal_err.txt
    #BSUB -u "youremail@american.edu"
    #BSUB -B
    #BSUB -N
    #BSUB n=2
    /path/to/R CMD BATCH --no-save --no-restore '--args nc=2' minimal.R minimal.Rout
  • You have a list of LSF options. Each option begins with #BSUB

  • You then have a bash command. In this case we are running R in batch mode via

    /path/to/R CMD BATCH --no-save --no-restore '--args nc=2' minimal.R minimal.Rout
  • The “path/to/R” can be found by typing the following in bash:

    which R
  • This might not be the version of R that you want. Zorro has a few copies or R installed. You can see these versions by typing in bash:

    ls -d /app/R-*
  • As of the writing of this document, they have versions 3.6.0, 3.6.1, 4.0.2, and 4.1.0

  • You can use a specific version of R (e.g. 4.0.2) by using

    /app/R-4.0.2/bin/R CMD BATCH --no-save --no-restore '--args nc=2' minimal.R minimal.Rout
  • That '--args nc=2' trick is a way to pass arbitrary arguments to an R script. See here.

  • Typical options are:

    • #BSUB -J job_name: Name of the job, so that you can see that name when you check the status of your job.
    • #BSUB -q normal: The queue to submit your job to. You shouldn’t change this for Zorro.
    • #BSUB -o job_out.txt: Where to send the job’s output.
    • #BSUB -e job_err.txt: Where to send the job’s error messages.
    • #BSUB -u "youremail@american.edu": Send mail to the specified user.
    • #BSUB -B: Emails you when the job begins.
    • #BSUB -N: Emails you when the job ends.
    • #BSUB n=2: Submits a parallel job and specifies the number of tasks in the job (in this case, 2).
  • A full list of options can be found here

Paired R script

  • You need to format your R script to be able to run in parallel.

  • Example R Script

    ## Set library for R packages ----
    .libPaths(c("/home/dgerard/R/4.0.2/", .libPaths()))
    
    ## Attach packages for parallel computing ----
    library(foreach)
    library(doFuture)
    
    ## Determine number of cores ----
    args <- commandArgs(trailingOnly = TRUE)
    if (length(args) == 0) {
      nc <- 1
    } else {
      eval(parse(text = args[[1]]))
    }
    cat(nc, "\n")
    
    ## Register workers ----
    if (nc == 1) {
      registerDoSEQ()
      plan(sequential)
    } else {
      registerDoFuture()
      plan(multisession, workers = nc)
      if (getDoParWorkers() == 1) {
        stop("nc > 1, but only one core registered")
      }
    }
    
    ## Run R script ----
    x <- foreach(i = 1:2, .combine = c) %dopar% {
      Sys.sleep(1)
      i
    }
    x
    
    ## Unregister workers ----
    if (nc > 1) {
      plan(sequential)
    }
  • The above is the template I use.

  • The above only allows to run on multiple cores on one node. So the above does not allow for multi-node processing.

  • You should only need to modify two things in the above code:

    1. The code right after ## Run R script ----, where you implement the your computations.
    2. The path for your R library, in the .libPaths() call. This is where your R packages are locally stored (see below).

bsub commands

  • You submit and control jobs with the bsub command in bash.

  • The following will submit the job in "minimal.lsf"

    bsub < minimal.lsf
    • The above says to pipe the minimal.lsf file to the bsub command.
    • Otherwise, you would need to use bsub like bsub -q normal -n 2 ...
  • Use bjobs to display information on the jobs that you have submitted.

    • Display all of your jobs:

      bjobs -a
    • Display all of the jobs of a user:

      bjobs -u user_name
    • Display information about a particular job

      bjobs job_id
  • Different job states that you will see:

    • PEND: Waiting in a queue.
    • RUN: Currently running.
    • DONE: Successfully finished with no errors.
    • EXIT: Errored, did not finish successfully.
  • Use bkill to kill a job.

    bkill job_id

Installing R Packages

  • You can see what packages are already installed on Zorro by typing

    ls /app/R-4.0.2/lib64/R/library
    • It’s not a lot.
  • You need to install R packages in a local directory, since global install is not supported (because not everyone wants your R packages).

  • You should create an R directory where you put all things R:

    mkdir R
  • Then inside this directory, create a directory where you can place your packages for a specific version of R

    mkdir 4.0.2
  • Then create the following R file, called “install.R”, which can be used to install R packages into that directory (but change my username to yours):

    .libPaths(c("/home/dgerard/R/4.0.2/", .libPaths()))
    install.packages(c("tidyverse", 
                       "future",
                       "doFuture",
                       "foreach"),
                     lib = "/home/dgerard/R/4.0.2/",
                     repos = "http://cran.us.r-project.org")
    • In the above, I install {tidyverse}, {future}, {doFuture}, and {foreach} packages. But you can add more.
  • If you want to use Bioconductor packages in R Version 4.0.2, run the following:

    .libPaths(c("/home/dgerard/R/4.0.2/", .libPaths()))
    install.packages("BiocManager",
                     lib = "/home/dgerard/R/4.0.2/",
                     repos = "http://cran.us.r-project.org")
    BiocManager::install(version = "3.12", lib = "/home/dgerard/R/4.0.2/", ask = FALSE)
    BiocManager::install(c("tidyverse",
                           "future",
                           "doFuture",
                           "foreach"),
                         lib = "/home/dgerard/R/4.0.2/",
                         ask = FALSE)
    • I got the version of Bioconductor corresponding to R 4.0.2 from here
  • Set up an LSF file to run this script, called “install.lsf”:

    #BSUB -J install_r_pkgs
    #BSUB -q normal
    #BSUB -o install_out.txt
    #BSUB -e install_err.txt
    #BSUB -u "youremail@american.edu"
    #BSUB -B
    #BSUB -N
    #BSUB n=1
    /app/R-4.0.2/bin/R CMD BATCH --no-save --no-restore install.R install.Rout
  • Then run this job

    bsub < install.lsf
  • At the top of every R file from now on, put the following (but change my username to yours):

    .libPaths(c("/home/dgerard/R/4.0.2/", .libPaths()))
  • Now you have access to those packages that you installed in the “R/4.0.2” directory.

Installing other software locally

  1. Download a tar file using wget <software_url>. You should download this into a common directory for all of your local installs, like “apps”.

  2. Decompress the tar file using tar -zxvf <tar_file>

  3. Move into the newly decompressed file using cd <new_file>

  4. Look at the README for further steps. Usually there is a makefile and it’s as easy as running make and/or make install. But read the README first.

  5. Add the location to the PATH by using something like export PATH=$HOME/path/to/software:$PATH. You should put this in your “.bashrc”

  6. Source your “.bashrc” file with . ~/.bashrc.

  7. Confirm that the software is installed with which <software>

New Functions


National Science Foundation Logo American University Logo Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.