Learning Objectives

Batch Processing
Schedulars and LSF (“load sharing facility”)
LSF bsub options
Minimal LSF Example

Running R in Batch

You can run an R script file in batch mode (i.e. in the background) from bash by using the following command:
```
R CMD BATCH --no-save --no-restore input_file.R output_file.Rout
```
Make sure to change “input_file.R” and “output_file.Rout”
- The --no-save --no-restore options make sure that you are working with a clean environment and that you don’t save this environment after the command is executed. This is a good thing for reproducibility.
The newer way is to use Rscript. The following command will do mostly what R CMD BATCH will do, except only the output of print() will be sent to “output_file.Rout”, and the console output will not be sent.
```
Rscript input_file.R > output_file.Rout
```
Even though R CMD BATCH is an older way to run R scripts in batch mode, I like it better because:
1. Rscript forces the --no-echo option, so console output is not printed.
2. I’ve had stability issues with Rscript when using it on the super computer. But that was a few years ago and they might have fixed things by now.
One nice thing about Rscript is that you can run one-liners of R code from bash using -e:
```
Rscript -e 'print("hello world")'
```
```
## [1] "hello world"
```
This allows you to knit an R markdown file from bash by using the following command:
```
Rscript -e "library(rmarkdown);render('rmarkdown_file.Rmd')"
```
On Windows machines, the above commands should also work on Windows PowerShell, but not Git Bash for Windows.
Exercise: Write an R script that:
1. Loads mtcars
2. Makes a scatterplot of mpg on wt
3. Saves the output to a file called “plot.pdf”
Then run this script in batch mode.

Executable R Scripts

On Unix-like machines (e.g. Ubuntu and Mac), you can make an R file executable using shebang.
At the very top of your R script, place the following
```
#!/path/to/Rscript --vanilla
```
where “/path/to/Rscript” is replaced with where Rscript is installed on your machine.
- The --vanilla option makes sure that your script is reproducible in other users’ computing environments.
You can find where Rscript is located by running the following in bash:
```
which Rscript
```
```
## /usr/bin/Rscript
```
It is common for “/usr/bin/Rscript” to be your Rscript location, unless you changed the defaults.
Then change the mode to executable using chmod
```
chmod +x filename.R
```
Then you can execute it by running the following in bash
```
./filename.R
```
You can make it so that you can execute this file anywhere on your computer by adding it to the $PATH.
This makes creating command line interfaces for your R code relatively simple.

Suppose I have the following R script file called “ex1.R”

#!/usr/bin/Rscript --vanilla
cat("Hello World\n", file = "hello.txt")

Then I run
```
chmod +x ex1.R
```
Then, whenever I run the following, that script will run:
```
./ex1.R
```

LSF

American University has a supercomputer called Zorro.
To use it, you have to submit jobs via a schedular.
The schedular Zorro uses is LSF, so we’ll learn that.
There are many other, popular schedulars. The syntax is all different, but they all operate on similar ideas.

Preparations

You need faculty sponsorship to get access to Zorro. Once you have a sponsorship, fill out a request form from the Zorro Website.
After you have permission to access Zorro, you interface with it using SSH.
- Windows users should download and install PuTTY to use SSH.
- Mac users already have SSH installed in bash.
You transfer files between the supercomputer and your computer through FTP.
- Windows users should download and install WinSCP to do use FTP. I sometimes also use FileZilla
- Mac users can already use FTP through the terminal. But using FileZilla might be a little easier.
You are required to use the AU VPN while accessing Zorro. All users should follow the AU instructions here to connect to the AU VPN.

Connecting to Zorro

Connect to the AU VPN.
Windows Users:
1. Open PuTTY
2. Under Host Name, type zorro.american.edu to connect to the server. Press Enter
3. A black screen will appear where you need to enter your AU username and password.
Mac Users:
1. In the terminal, type:
```
ssh -l username zorro.american.edu
```
  where “username” is your AU username.
2. Type in your AU password and hit enter.
Once you are connected to Zorro, you use bash to navigate, run programs, submit jobs, etc..

LSF Files

You format the properties of a job in terms of a LSF file.

Example LSF File:

#BSUB -J minimal_example
#BSUB -q normal
#BSUB -o minimal_out.txt
#BSUB -e minimal_err.txt
#BSUB -u "youremail@american.edu"
#BSUB -B
#BSUB -N
#BSUB n=2
/path/to/R CMD BATCH --no-save --no-restore '--args nc=2' minimal.R minimal.Rout

You have a list of LSF options. Each option begins with #BSUB

You then have a bash command. In this case we are running R in batch mode via

/path/to/R CMD BATCH --no-save --no-restore '--args nc=2' minimal.R minimal.Rout

The “path/to/R” can be found by typing the following in bash:
```
which R
```
This might not be the version of R that you want. Zorro has a few copies or R installed. You can see these versions by typing in bash:
```
ls -d /app/R-*
```
As of the writing of this document, they have versions 3.6.0, 3.6.1, 4.0.2, and 4.1.0

You can use a specific version of R (e.g. 4.0.2) by using

/app/R-4.0.2/bin/R CMD BATCH --no-save --no-restore '--args nc=2' minimal.R minimal.Rout

That '--args nc=2' trick is a way to pass arbitrary arguments to an R script. See here.
Typical options are:
- #BSUB -J job_name: Name of the job, so that you can see that name when you check the status of your job.
- #BSUB -q normal: The queue to submit your job to. You shouldn’t change this for Zorro.
- #BSUB -o job_out.txt: Where to send the job’s output.
- #BSUB -e job_err.txt: Where to send the job’s error messages.
- #BSUB -u "youremail@american.edu": Send mail to the specified user.
- #BSUB -B: Emails you when the job begins.
- #BSUB -N: Emails you when the job ends.
- #BSUB n=2: Submits a parallel job and specifies the number of tasks in the job (in this case, 2).
A full list of options can be found here

Paired R script

You need to format your R script to be able to run in parallel.

Example R Script

## Set library for R packages ----
.libPaths(c("/home/dgerard/R/4.0.2/", .libPaths()))

## Attach packages for parallel computing ----
library(foreach)
library(doFuture)

## Determine number of cores ----
args <- commandArgs(trailingOnly = TRUE)
if (length(args) == 0) {
  nc <- 1
} else {
  eval(parse(text = args[[1]]))
}
cat(nc, "\n")

## Register workers ----
if (nc == 1) {
  registerDoSEQ()
  plan(sequential)
} else {
  registerDoFuture()
  plan(multisession, workers = nc)
  if (getDoParWorkers() == 1) {
    stop("nc > 1, but only one core registered")
  }
}

## Run R script ----
x <- foreach(i = 1:2, .combine = c) %dopar% {
  Sys.sleep(1)
  i
}
x

## Unregister workers ----
if (nc > 1) {
  plan(sequential)
}

The above is the template I use.
The above only allows to run on multiple cores on one node. So the above does not allow for multi-node processing.
You should only need to modify two things in the above code:
1. The code right after ## Run R script ----, where you implement the your computations.
2. The path for your R library, in the .libPaths() call. This is where your R packages are locally stored (see below).

`bsub` commands

You submit and control jobs with the bsub command in bash.
The following will submit the job in "minimal.lsf"
```
bsub < minimal.lsf
```
- The above says to pipe the minimal.lsf file to the bsub command.
- Otherwise, you would need to use bsub like bsub -q normal -n 2 ...
Use bjobs to display information on the jobs that you have submitted.
- Display all of your jobs:
```
bjobs -a
```
- Display all of the jobs of a user:
```
bjobs -u user_name
```
- Display information about a particular job
```
bjobs job_id
```
Different job states that you will see:
- PEND: Waiting in a queue.
- RUN: Currently running.
- DONE: Successfully finished with no errors.
- EXIT: Errored, did not finish successfully.
Use bkill to kill a job.
```
bkill job_id
```

Installing R Packages

You can see what packages are already installed on Zorro by typing
```
ls /app/R-4.0.2/lib64/R/library
```
- It’s not a lot.
You need to install R packages in a local directory, since global install is not supported (because not everyone wants your R packages).
You should create an R directory where you put all things R:
```
mkdir R
```
Then inside this directory, create a directory where you can place your packages for a specific version of R
```
mkdir 4.0.2
```

Then create the following R file, called “install.R”, which can be used to install R packages into that directory (but change my username to yours):

.libPaths(c("/home/dgerard/R/4.0.2/", .libPaths()))
install.packages(c("tidyverse", 
                   "future",
                   "doFuture",
                   "foreach"),
                 lib = "/home/dgerard/R/4.0.2/",
                 repos = "http://cran.us.r-project.org")

In the above, I install {tidyverse}, {future}, {doFuture}, and {foreach} packages. But you can add more.

If you want to use Bioconductor packages in R Version 4.0.2, run the following:

.libPaths(c("/home/dgerard/R/4.0.2/", .libPaths()))
install.packages("BiocManager",
                 lib = "/home/dgerard/R/4.0.2/",
                 repos = "http://cran.us.r-project.org")
BiocManager::install(version = "3.12", lib = "/home/dgerard/R/4.0.2/", ask = FALSE)
BiocManager::install(c("tidyverse",
                       "future",
                       "doFuture",
                       "foreach"),
                     lib = "/home/dgerard/R/4.0.2/",
                     ask = FALSE)

I got the version of Bioconductor corresponding to R 4.0.2 from here

Set up an LSF file to run this script, called “install.lsf”:

#BSUB -J install_r_pkgs
#BSUB -q normal
#BSUB -o install_out.txt
#BSUB -e install_err.txt
#BSUB -u "youremail@american.edu"
#BSUB -B
#BSUB -N
#BSUB n=1
/app/R-4.0.2/bin/R CMD BATCH --no-save --no-restore install.R install.Rout

Then run this job
```
bsub < install.lsf
```
At the top of every R file from now on, put the following (but change my username to yours):
```
.libPaths(c("/home/dgerard/R/4.0.2/", .libPaths()))
```
Now you have access to those packages that you installed in the “R/4.0.2” directory.

Installing other software locally

For some R packages, you need to have additional software installed. But global install is not allowed on the super computer.
Here are the general steps:

Download a tar file using wget <software_url>. You should download this into a common directory for all of your local installs, like “apps”.
Decompress the tar file using tar -zxvf <tar_file>
Move into the newly decompressed file using cd <new_file>
Look at the README for further steps. Usually there is a makefile and it’s as easy as running make and/or make install. But read the README first.
Add the location to the PATH by using something like export PATH=$HOME/path/to/software:$PATH. You should put this in your “.bashrc”
Source your “.bashrc” file with . ~/.bashrc.
Confirm that the software is installed with which <software>

You can see an example of this pipeline here and here.
This pipeline won’t work all of the time. You need to read the docs.

New Functions

R CMD BATCH: Run R in batch mode (classic way).
Rscript: Run R in batch mode (newer way).
.libPaths(): Set and get the paths where R will search for installed packages.
bsub: Submit an LSF job.
bjobs: See status of LSF jobs.
bkill: Kill an LSF job.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Batch Processing and LSF

David Gerard

2022-04-21

Learning Objectives

Running R in Batch

Executable R Scripts

LSF

Preparations

Connecting to Zorro

LSF Files

Paired R script

`bsub` commands

Installing R Packages

Installing other software locally

New Functions

Batch Processing and LSF

David Gerard

2022-04-21

Learning Objectives

Running R in Batch

Executable R Scripts

LSF

Preparations

Connecting to Zorro

LSF Files

Paired R script

bsub commands

Installing R Packages

Installing other software locally

New Functions

`bsub` commands