Computations on Palma
Compiling Software
The module concept
Environment variables (like PATH, LD_LIBRARY_PATH) for compilers and libraries can be set by modules:
Command (Short- and Long-form) |
Meaning |
module add modul1 modul2 ... |
Adds module to the actual environment |
module purge |
Deletes all modules from actual environment |
module rm modul1 modul2 ... |
Deletes module from the actual environment |
module av[ailable] |
Lists all available modules |
module show modulname |
Lists all changes caused by a module |
module li[st] |
Lists all modules in the actual enviroment |
To get acces to all modules, the following modules have to be added to the standard environment (this can be done by adding them to .bashrc):
module add shared
module add /Applic.PALMA/modules/path
Intel MPI is installed on Palma. To use, load the appropriate module:
module add intel/mpi/3.2.2.006
The compiler executables are similar to their serial versions:
mpiicc,
mpiicpc,
mpiifort etc.
Example: Compile a program that uses FFTW2:
module add fftw/2.1.5
module add intel/cc/11.1.059
module add intel/mpi/3.2.2.006
mpiicc -I ${FFTW2_INCLUDE_DIR} -o program mpifftw2d.c -g -O3 -L${FFTW2_LIB_DIR} -lsrfftw_mpi -lsfftw_mpi -lsrfftw -lsfftw -lm
Explanaition: The module
fftw/2.1.5 sets the environment variables FFTW2_INCLUDE_DIR and FFTW2_LIB_DIR. These can be used to shorten the compiler calls (also in makefiles).
Submitting jobs
The batch system Torque and the scheduler Maui are used to submit jobs. It is not allowed, to start jobs manually. Batch jobs should only be submitted from the server zivcluster.
Creating submit-files
Example of a submit-file of a MPI-job:
#PBS -o output.dat
#PBS -l walltime=01:00:00,nodes=4:westmere:ppn=12
#PBS -A project_name
#PBS -M username@uni-muenster.de
#PBS -m ae
#PBS -q default
#PBS -N job_name
#PBS -j oe
cd $PBS_O_WORKDIR
mpdboot --rsh=ssh -n 4 -f $PBS_NODEFILE -v
mpirun --rsh=ssh -machinefile $PBS_NODEFILE -np 32 ./executable
An MPI-job with 32 processes is started. For this purpose, 4 Westmere nodes with 32 cores each are demanded.
Further Information:
- project_name: Has to be replaced by the own project, otherwise the job will not run
- username: Replace by own username
- job_directory: Replace by the path, where the executable can be found
- executable: Enter the name of the executable
- walltime: The time needed for a whole run. At the moment, maximal 48 hours are possible
When no MPI is needed, the submit-file can be simpler.
Example for a job using openMP:
#PBS -o output.dat
#PBS -l walltime=01:00:00,nodes=1:westmere:ppn=12
#PBS -A project_name
#PBS -M username@uni-muenster.de
#PBS -m ae
#PBS -q default
#PBS -N job_name
#PBS -j oe
cd $HOME/job_directory
./executable
Submitting jobs / Managing the queue
A job is submitted by entering the command
qsub submit.cmd
, where
submit.cmd is the name of the submit-file.
Further commands:
- qstat: Shows the current queue
- qstat -a: As above, but with the number of requested cores
- qstat -n: Shows in detail, which nodes are used
- qdel job_number: Deletes jobs from the queue
- showbf: Shows the number of free cores
Choosing the compute nodes
The option "#PBS -l" determines, which resourses are required for the batch job. Due to the existence of two different kind of nodes it can be distinguished between them with the attribute "nehalem" and "westmere". The following tables shows different possibilities to reserve nodes.
in node-file |
nodes that will be reserved |
-l nodes=10 |
10 arbitrary CPU cores. Not recommended |
-l nodes=2:ppn=12 |
2 Westmere nodes |
-l nodes=1:ppn=8 |
8 cores of a Westmere or Nehalem node |
-l nodes=2:nehalem:ppn=8+3:westmere:ppn=12 |
16 cores of 2 Nehalem nodes and 36 cores of 3 Westmere nodes |
-l nodes=8:ppn=1 |
Produces an error. ppn should be larger than 1. Due to a bug in TORQUE, the user gets 8 cores on arbitrary nodes in this case. |
To submit jobs to the SMP-system, it is neccessary to login to them directly, at the moment. They cannot be accessed via the batch system at the moment. The name of the nodes are: palma060, palma061, palma062 and palma063.
The scratch partition
In /scratch there are 180 TB space waiting for user data. For space and performance reasons, data generated by the codes running on palma should be stored here. The filesystem used here is lustre, which is a parralel filesystem.
There is no backup of the scratch partition!
--
HolgerAngenent - 2010-08-16