High Performance Computing at Lehigh -- Using PBS/TORQUE on Corona

Using PBS/TORQUE on Corona

Introduction

Goal: After reading this page you will have a basic understanding of compiling and running MPI jobs on the Corona cluster.

Corona is a 1040 core cluster running CentOS Linux 5.5. It uses the TORQUE Resource Manager (version 2.3.6) to provide control over batch jobs and distributed computing resources.

This guide assumes you have already obtained a Service Level Enhanced-II account for Corona, and that you can successfully log in via SSH.

Set up the environment for PBS

Before running any PBS jobs we need to set up the environment correctly. First, edit the .bash_profile file.

[ovd209@corona1 ~]$ nano ~/.bash_profile

Make sure the .bash_profile file has the following content:

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin
BASH_ENV=$HOME/.bashrc

export PATH
unset USERNAME

Now edit the .bashrc file.

nano ~/.bashrc

Make sure the .bashrc file has the following content:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

export MPD_USE_ROOT_MPD=1

# User specific aliases and functions

After you logout and login again, the environment variable MPD_USE_ROOT_MPD should be set:

[ovd209@corona1 ~]$ set | grep -i MPD
MPD_USE_ROOT_MPD=1

Also, mpdtrace should show a list of the 25 Infiniband connected nodes. The rest of the nodes are connected with Gigabit Ethernet.

[ovd209@corona1 ~]$ mpdtrace -l
corona1_4268 (192.168.4.1)
corona8_60686 (192.168.4.8)
corona4_59024 (192.168.4.4)
corona9_52398 (192.168.4.9)
corona2_54172 (192.168.4.2)
corona11_35544 (192.168.4.11)
corona10_41963 (192.168.4.10)
corona14_54305 (192.168.4.14)
corona15_54688 (192.168.4.15)
corona3_43573 (192.168.4.3)
corona24_47260 (192.168.4.24)
corona23_56952 (192.168.4.23)
corona25_48798 (192.168.4.25)
corona12_55849 (192.168.4.12)
corona6_58086 (192.168.4.6)
corona7_39543 (192.168.4.7)
corona16_42129 (192.168.4.16)
corona18_50362 (192.168.4.18)
corona17_41618 (192.168.4.17)
corona20_59760 (192.168.4.20)
corona22_59195 (192.168.4.22)
corona13_46245 (192.168.4.13)
corona5_46765 (192.168.4.5)
corona21_47756 (192.168.4.21)
corona19_53752 (192.168.4.19)

Select the MPI version

Corona has several versions of MPI installed. To list the versions you can use mpiselector, as shown below.

[ovd209@corona1 mpi2_hello_world]$ mpi-selector --list
mpich2_gcc-1.2
mvapich2_gcc-1.5.1
mvapich_gcc-1.2.0

mvapich_gcc is MPI1 with Infiniband support, mvapich2_gcc is MPI2 with Infiniband support, and mpich2_gcc is standard MPI2 (without Infiniband support).

In order to check the MPI version currently activated on your account, you can use the --query switch. On new accounts no MPI version is correctly selected, no matter what mpi-selector says, so make sure you select one with the --set argument, as described below.

[ovd209@corona1 mpi2_hello_world]$ mpi-selector --query
default:mvapich2_gcc-1.5.1
level:user

To change the version, you can use the --set argument.

[ovd209@corona1 mpi2_hello_world]$ mpi-selector --set mvapich2_gcc-1.5.1
Defaults already exist; overwrite them? (y/N) y

For this guide we will use mvapich2_gcc-1.5.1. After you configured the right version, please logout and login again, for the change to take effect.

MPI 2 Hello World

We will compile and run a simple MPI 2 program written in C. First, make a folder and create a file with the source code below, using your favorite editor.

[ovd209@corona1 ~]$ cd ~
[ovd209@corona1 ~]$ mkdir mpi2_hello_world
[ovd209@corona1 ~]$ cd mpi2_hello_world/
[ovd209@corona1 mpi2_hello_world]$ nano hellow2.c

Put the code below in hellow2.c, then save the file.

#include <stdio.h>  /* printf and BUFSIZ defined there */
#include <stdlib.h> /* exit defined there */
#include <mpi.h>    /* all MPI-2 functions defined there */

int main(argc, argv)
int argc;
char *argv[];
{
   int rank, size, length;
   char name[BUFSIZ];

   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &size);
   MPI_Get_processor_name(name, &length);

   printf("%s: hello world from process %d of %d\n", name, rank, size);

   MPI_Finalize();

   exit(0);
}

Next, compile the code with mpicc

[ovd209@corona1 mpi2_hello_world]$ mpicc -o hellow2 hellow2.c

We can see the list of PBS queues, along with their status and their limits by running qstat -q.

[ovd209@corona1 ~]$ qstat -q

server: corona1

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
low-ib             --      --    672:00:0   --    0   0 40   E R
short              --      --    00:05:00   --    0   0 10   E R
medium             --      --    336:00:0   --    0   0 10   E R
medium-ib          --      --    336:00:0   --    0   0 40   E R
high-ib            --      --    24:00:00   --    1   0 40   E R
short-ib           --      --    00:05:00   --    0   0 40   E R
high               --      --    24:00:00   --    0   0 10   E R
low                --      --    672:00:0   --  640   0 10   E R
                                               ----- -----
                                                 641     0

Now, let's create a PBS configuration file.

[ovd209@corona1 mpi2_hello_world]$ nano run.pbs

Below you can see a simple submit file.

# High priority queue, 3 nodes, 16 cores / node.
# Not re-runnable, stdout and stderr files specified.

# Species the name PBS gives to this job. This name appears when you run qstat.
#PBS -N mvapich2_hello_world

# Use the high queue
#PBS -q high

# Ask for 3 nodes. On each node ask for 16 cores.
#PBS -l nodes=3:ppn=16

# Do not rerun this job if it fails
#PBS -r n

# Send e-mail notifications at this address
#PBS -M your_email@lehigh.edu

# Send e-mails when the job [b]egins, [e]nds, or [a]borts
#PBS -m bea

# The name of the error output file
#PBS -e mpi2_hello_world.err

# The name of the output file
#PBS -o mpi2_hello_world.out

executable=hellow2

# Determines how many cores to run on
NPROCS=`wc -l < $PBS_NODEFILE`

mpirun -machinefile $PBS_NODEFILE -np $NPROCS $PBS_O_WORKDIR/$executable

Next, you can run the job.

[ovd209@corona1 mpi2_hello_world]$ qsub run.pbs 
394846.corona1.cc.lehigh.edu

You can check the progress of the job by running qstat.

[ovd209@corona1 mpi2_hello_world]$ qstat -a

corona1.cc.lehigh.edu: 
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
394847.corona1.c     ovd209   high     mvapich2_hello_w    --      3   1    --  24:00 R   -- 

After the job is done (usually takes just a few seconds), you can check the output.

[ovd209@corona1 mpi2_hello_world]$ cat mpi2_hello_world.out 
corona25: hello world from process 1 of 48
corona25: hello world from process 4 of 48
corona25: hello world from process 8 of 48
corona25: hello world from process 16 of 48
corona24: hello world from process 17 of 48
corona25: hello world from process 6 of 48
corona25: hello world from process 9 of 48
corona24: hello world from process 18 of 48
corona24: hello world from process 32 of 48
...

The mpi2_hello_world.err file should be empty.

[ovd209@corona1 mpi2_hello_world]$ cat mpi2_hello_world.err
[ovd209@corona1 mpi2_hello_world]$

That's it, congratulations!

Finally, below you can see a longer submit file, which outputs extra diagnostic information to the output file.

# High priority queue, 3 nodes, 16 cores / node.
# Not re-runnable, stdout and stderr files specified.

# Species the name PBS gives to this job. This name appears when you run qstat.
#PBS -N mvapich2_hello_world

# Use the high queue
#PBS -q high

# Ask for 3 nodes. On each node ask for 16 cores.
#PBS -l nodes=3:ppn=16

# Do not rerun this job if it fails
#PBS -r n

# Send e-mail notifications at this address
#PBS -M your_email@lehigh.edu

# Send e-mails when the job [b]egins, [e]nds, or [a]borts
#PBS -m bea 

# The name of the error output file
#PBS -e mpi2_hello_world.err

# The name of the output file
#PBS -o mpi2_hello_world.out

executable=hellow2

echo "Start MPICH PBS job       : master node `hostname`, date `date`"
echo "PBS job id                : $PBS_JOBID"
echo "PBS_O_WORKDIR             : $PBS_O_WORKDIR"

# Determines how many cores to run on
NPROCS=`wc -l < $PBS_NODEFILE`

echo "Number of requested cores : $NPROCS"
echo "Assigned node names       : "`cat $PBS_NODEFILE`
echo "executable name           : $PBS_O_WORKDIR/$executable"
echo

mpirun -machinefile $PBS_NODEFILE -np $NPROCS $PBS_O_WORKDIR/$executable

echo
echo "End MPICH PBS job         : master node `hostname`, date `date`"
echo

The output file should now look like this:

Start MPICH PBS job	  : master node corona25, date Thu May 26 13:38:20 EDT 2011
PBS job id                : 396570.corona1.cc.lehigh.edu
PBS_O_WORKDIR             : /home/ovd209/mpi2_hello_world
Number of requested cores : 48
Assigned node names	  : corona25 corona25 corona25 corona25 corona25 corona25 corona25 corona25 corona25 corona25 corona25 corona25 corona25 
corona25 co$
executable name           : /home/ovd209/mpi2_hello_world/hellow2

corona25: hello world from process 1 of 48
corona25: hello world from process 2 of 48
corona25: hello world from process 6 of 48
corona25: hello world from process 3 of 48
corona25: hello world from process 4 of 48
corona25: hello world from process 8 of 48
corona25: hello world from process 16 of 48
corona23: hello world from process 33 of 48
corona25: hello world from process 13 of 48
corona24: hello world from process 27 of 48
corona24: hello world from process 24 of 48
corona24: hello world from process 26 of 48
...

End MPICH PBS job         : master node corona25, date Thu May 26 13:38:24 EDT 2011