Computations on Palma

Compiling Software

The module concept

Environment variables (like PATH, LD_LIBRARY_PATH) for compilers and libraries can be set by modules:

Command (Short- and Long-form) Meaning
module av[ailable] Lists all available modules
module li[st] Lists all modules in the actual enviroment
module show modulname Lists all changes caused by a module
module add modul1 modul2 ... Adds module to the actual environment
module rm modul1 modul2 ... Deletes module from the actual environment
module purge Deletes all modules from actual environment

To get acces to all modules, the following modules have to be added to the standard environment (this can be done by adding them to .bashrc):

module add shared
module add /Applic.PALMA/modules/path

Intel MPI is installed on Palma. To use, load the appropriate module:

module add intel/mpi/3.2.2.006
The compiler executables are similar to their serial versions: mpiicc, mpiicpc, mpiifort etc.

Example: Compile a program that uses FFTW2:

module add fftw/2.1.5
module add intel/cc/11.1.059
module add intel/mpi/3.2.2.006
mpiicc -I ${FFTW2_INCLUDE_DIR} -o program mpifftw2d.c -g -O3 -L${FFTW2_LIB_DIR} -lsrfftw_mpi -lsfftw_mpi -lsrfftw -lsfftw -lm
Explanaition: The module fftw/2.1.5 sets the environment variables FFTW2_INCLUDE_DIR and FFTW2_LIB_DIR. These can be used to shorten the compiler calls (also in makefiles).

Submitting jobs

The batch system Torque and the scheduler Maui are used to submit jobs. It is not allowed, to start jobs manually. Batch jobs should only be submitted from the server zivcluster.

Creating submit-files

Example of a submit-file of a MPI-job:

#PBS -o output.dat
#PBS -l walltime=01:00:00,nodes=4:westmere:ppn=12
#PBS -A project_name
#PBS -M username@uni-muenster.de
#PBS -m ae
#PBS -q default
#PBS -N job_name
#PBS -j oe
cd $HOME/job_directory
cp $PBS_NODEFILE $HOME/$PBS_JOBID.nodefile
mpdboot --rsh=ssh -n 2 -f ~/$PBS_JOBID.nodefile  -v
mpirun --rsh=ssh -machinefile $PBS_NODEFILE -np 32 ./executable

An MPI-job with 32 processes is started. For this purpose, 2 Westmere nodes with 12 cores each are demanded.

Further Information:

  • project_name: Has to be replaced by the own project, otherwise the job will not run
  • username: Replace by own username
  • job_directory: Replace by the path, where the executable can be found
  • executable: Enter the name of the executable
  • walltime: The time needed for a whole run. At the moment, maximal 48 hours are possible

When no MPI is needed, the submit-file can be simpler.

Example for a job using openMP:

#PBS -o output.dat
#PBS -l walltime=01:00:00,nodes=1:westmere:ppn=12
#PBS -A project_name
#PBS -M username@uni-muenster.de
#PBS -m ae
#PBS -q default
#PBS -N job_name
#PBS -j oe
cd $HOME/job_directory
./executable

Submitting jobs / Managing the queue

A job is submitted by entering the command

 qsub submit.cmd 
, where submit.cmd is the name of the submit-file.

Further commands:

  • qstat: Shows the current queue
  • qstat -a: As above, but with the number of requested cores
  • qstat -n: Shows in detail, which nodes are used
  • qdel job_number: Deletes jobs from the queue
  • showbf: Shows the number of free cores

Choosing the compute nodes

The option "#PBS -l" determines, which resourses are required for the batch job. Due to the existence of two different kind of nodes it can be distinguished between them with the attribute "nehalem" and "westmere". The following tables shows different possibilities to reserve nodes.

in node-file nodes that will be reserved
-l nodes=10 10 arbitrary CPU cores. Not recommended
-l nodes=2:ppn=12 2 Westmere nodes
-l nodes=1:ppn=8 8 cores of a Westmere or Nehalem node
-l nodes=2:nehalem:ppn=8+3:westmere:ppn=12 16 cores of 2 Nehalem nodes and 36 cores of 3 Westmere nodes
-l nodes=8:ppn=1 Produces an error. ppn should be larger than 1. Due to a bug in TORQUE, the user gets 8 cores on arbitrary nodes in this case.

To submit jobs to the SMP-system, it is neccessary to login to them directly, at the moment. They cannot be accessed via the batch system at the moment. The name of the nodes are: palma060, palma061, palma062 and palma063.

The scratch partition

In /scratch there are 180 TB space waiting for user data. For space and performance reasons, data generated by the codes running on palma should be stored here. The filesystem used here is lustre, which is a parralel filesystem.

There is no backup of the scratch partition!

-- HolgerAngenent - 2010-08-16

Edit | Attach | Watch | Print version | History: r18 | r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2010-10-29 - HolgerAngenent
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding ZIVwiki? Send feedback
Datenschutzerklärung Impressum