Difference: PALMAII (1 vs. 18)

Revision 182021-01-15 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

!!! Attention !!! A new Wiki concerning information about PALMA II and HPC in general can be found at the WWU Confluence!

Deleted:
<
<
Content

Overview

Palma II is the HPC system of the Zentrum für Informationsverarbeitung. To be able to log in, you have to register for the group u0clstr in MeinZIV. The login node is palma2c.uni-muenster.de at the moment. You can reach it via ssh (from Windows with putty for example)

Filesystems

When you log in to the cluster for the first time, a directory in /home is created for you. Please use this only to store your programs, but don't store your numerical results there. We have limited your storage in home to 400GB. You have to create a directory in /scratch/tmp to store the data you create on the compute nodes there. To enforce this, we will mount home read only on the compute nodes in the future. And since /scratch is not intended as an archive you are asked to remove your data there as soon as you do not need them anymore.

Software/The module concept

The software on palma-ng can be accessed via modules. These are small script that set environment variables (like PATH and LD_LIBRARY_PATH) pointing to the locations where the software is installed (this is mostly on network drives so that the software is available on every node in the cluster). The module system we use here is LMOD (1). In contrast to the older environment modules we used on PALMA I and NWZPHI, there is the new command "module spider". Please find more information on this below.

The most important difference between Palma I and PALMA II is the [https://hpcugent.github.io/easybuild/files/hust14_paper.pdf][hierarchical module naming scheme]] (2)

(1) https://www.tacc.utexas.edu/research-development/tacc-projects/lmod

(2) https://hpcugent.github.io/easybuild/files/hust14_paper.pdf

Command (Short- and Long-form) Meaning
module av[ailable] Lists all currently available modules
module spider List all available modules with their description
module spider modulename Show the description of a module and give a hint, which modules have to be loaded to make it available.
module li[st] Lists all modules in the actual enviroment
module show modulname Lists all changes caused by a module
module add modul1 modul2 ... Adds module to the current environment
module rm modul1 modul2 ... Deletes module from the current environment
module purge Deletes all modules from current environment
Hierarchical module naming scheme means that you do not see all modules at the same time. You will have to load a toolchain or compiler first to see the software that has been compiled with those. At the moment there are the following toolchains:

  • foss/2018a GCC with OpenMPI
  • intel/2018a Intel Compiler with Intel MPI

If you want to use the Intel compiler, you can type for example the following:


module add intel/2018a
module av

and you will see the software that has been compiled with this version. Alternatively you can use the "module spider" command.

Monitoring

  • Ganglia
  • If you have X forwarding enabled, you can use sview (Just type "sview" at the command line).
  • pestat (A command line tool for monitoring the batch system)

The batch system

The batch system on PALMA II is SLURM. If you are used to PBS/Maui and want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf

The partitions

  • normal: 434 nodes with 72 CPU threads and 92 respectively 192 GB RAM. The maximal run time is 7 days. To be able to use the himem nodes (with 192 GB), you have to set the #SBATCH --mem parameter to a value higher than 92GB.
  • express: 5 nodes with 72 threads and 92 GB RAM (one of them with 192 GB). A partition for short running (test) jobs with a maximal walltime of 2 hours.
  • bigsmp: 3 nodes with 144 threads and 1,5 TB RAM
  • largesmp: 2 nodes with 144 threads and 3 TB RAM
  • requeue: Job in this queue will run on the nodes of the exclusive nodes below. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care. The maximal walltime is 24 hours. There are also 2 1,5 TB machines available in the requeue partition.
  • gpuk20: Four nodes with 3 nvidia K20 GPUs
  • gpuv100: One node with 4 nvidia V100 GPUs
  • gputitanxp: One node with 8 nvidia TitanXP GPUs

There are some special partitions, which are only allowed for certain groups (these are also Skylake nodes like in the normal queue):

  • p0fuchs: 9 lowmen (96 GB) nodes
  • p0kulesz: 6 lowmem and 3 himem (192 GB) nodes
  • p0klasen: 1 lowmem an 1 himem node
  • p0kapp: 1 lowmem node
  • hims: 25 lowmem and 38 himem nodes
  • d0ow: 1 lowmem node
  • q0heuer: 15 lowmem nodes
  • e0mi: 2 himem nodes
  • p0rohlfi: 7 lowmem and 8 himem nodes

When using PBS skript, there are some differences to the old PALMA:

  • The first line of the submit script has to be #!/bin/bash
  • A queue is called partition in terms of SLURM. These terms will be used synonymous here.
  • The variable $PBS_O_WORKDIR will not be set. Instead you will start in the directory in which the script resides.

Submit a job

Create a file for example called submit.cmd

#!/bin/bash

# set the number of nodes
#SBATCH --nodes=1

# set the number of CPU cores per node
#SBATCH --ntasks-per-node 72

# How much memory is needed (per node). Possible units: K, G, M, T
#SBATCH --mem=64G

# set a partition
#SBATCH --partition normal

# set max wallclock time
#SBATCH --time=24:00:00

# set name of job
#SBATCH --job-name=test123

# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL

# set an output file
#SBATCH --output output.dat

# send mail to this address
#SBATCH --mail-user=your_account@uni-muenster.de

# run the application
./program

You can send your submission to the batch system with the command "sbatch submit.cmd"

It is recommended to reserve complete nodes, if you can use 72 threads.

A detailed description can be found here: http://slurm.schedmd.com/sbatch.html

Starting jobs with MPI-parallel codes

mpirun will get all necessary information from SLURM, if submitted appropriately. If you for example want to start 144 MPI ranks distributed to two nodes, you could do this the following way:

#!/bin/bash

# set the number of nodes
#SBATCH --nodes=2

# set the number of CPU cores per node
#SBATCH --exclusive

# How much memory is needed (per node). Possible units: K, G, M, T.
#SBATCH --mem=64G

# set a partition
#SBATCH --partition normal

# set max wallclock time
#SBATCH --time=2-00:00:00

# set name of job
#SBATCH --job-name=test123

# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL

# set an output file
#SBATCH --output output.dat

# send mail to this address
#SBATCH --mail-user=your_account@uni-muenster.de

# run the application
mpirun program

Some codes do not profit from Hyperthreading, so it is better, to start only 36 processes per node:

#!/bin/bash

# set the number of nodes
#SBATCH --nodes=2

# set the number of CPU cores per node
#SBATCH --exclusive

#SBATCH --ntasks-per-node=36

# How much memory is needed (per node). Possible units: K, G, M, T.
#SBATCH --mem=64G

# set a partition
#SBATCH --partition normal

# set max wallclock time
#SBATCH --time=2-00:00:00

# set name of job
#SBATCH --job-name=test123

# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL

# set an output file
#SBATCH --output output.dat

# send mail to this address
#SBATCH --mail-user=your_account@uni-muenster.de

# run the application
mpirun program

For starting hybrid jobs (meaning that they are using MPI and OpenMP parallelization at the same time), you can use the --cpus-per-task switch.

srun -p normal --nodes=2 --ntasks=72 --ntasks-per-node=36 --cpus-per-task=2 --pty bash
OMP_NUM_THREADS=2 mpirun ./program

Using the GPU nodes

If you want to use a GPU for your computations:

  • Use one of the gpu... partitions (see above)
  • Start your jobs with #SBATCH --export=none This is because there are other modules on the GPU nodes.
  • You can use the batch system to reserve only some of the GPUs. Use Slurm's generic resources for this https://slurm.schedmd.com/gres.html You can for example write #SBATCH --gres=gpu:1 to get only one GPU. Reserve CPUs accordingly.
Using Caffe

Caffe 1.0 is available for Python3 on the GPU partitions in the fosscuda/2018b toolchain. To use it, you have to load fosscuda/2018b and Caffe (ml fosscuda/2018b Caffe) and export the Caffe PYTHONPATH.

On Skylake nodes (gputitanxp and gpuv100 partitions)

PYTHONPATH=/Applic.HPC/skylakegpu/software/MPI/GCC-CUDA/7.3.0-2.30-9.2.88/OpenMPI/3.1.1/Caffe/1.0-Python-3.6.6/python:$PYTHONPATH

On Broadwell nodes (gpuk20 partition)

PYTHONPATH=/Applic.HPC/k20gpu/software/MPI/GCC-CUDA/7.3.0-2.30-9.2.88/OpenMPI/3.1.1/Caffe/1.0-Python-3.6.6/python:$PYTHONPATH

Show information about the partitions

scontrol show partition

Show information about the nodes

sinfo

Running interactive jobs with SLURM

Use for example the following command:

srun --partition express --nodes 1 --ntasks-per-node=8 --pty bash

This starts a job in the express partition on one node with eight cores.

Information on jobs

List all current jobs for a user:

squeue -u <username>

List all running jobs for a user:

squeue -u <username> -t RUNNING

List all pending jobs for a user:

squeue -u <username> -t PENDING

List all current jobs in the normal partition for a user:

squeue -u <username> -p normal

List detailed information for a job (useful for troubleshooting):

scontrol show job -dd <jobid>

Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.

To get statistics on completed jobs by jobID:

sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed

To view the same information for all jobs of a user:

sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed

Show priorities for waiting jobs:


sprio -l

Controlling jobs

To cancel one job:

scancel <jobid>

To cancel all the jobs for a user:

scancel -u <username>

To cancel all the pending jobs for a user:

scancel -t PENDING -u <username>

To cancel one or more jobs by name:

scancel --name myJobName

To pause a particular job:

scontrol hold <jobid>

To resume a particular job:

scontrol resume <jobid>

To requeue (cancel and rerun) a particular job:

scontrol requeue <jobid>

Visualization

For the visualization of bigger data sets, it is impractical to copy them to your local machine. We therefore offer a solution to do the postprocessing on Palma II. Since the CPUs are quite fast, the rendering is done in software.

  • Prequisites: You need a local installation of TurboVNC
  • Log in to palma and call
    ml vis/vnc
    vnc.sh
  • Wait until the session has started and follow the instructions of the script (ssh to the compute node and start your local TurboVNC)
  • Open a terminal in the VNC window and enter "module add intel Mesa" or "module add foss Mesa"
  • Start an application with GUI

-- Holger Angenent - 2018-07-11

Revision 172019-09-09 - SebastianPotthoff

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Added:
>
>

!!! Attention !!! A new Wiki concerning information about PALMA II and HPC in general can be found at the WWU Confluence!

 
Content

Overview

Line: 300 to 285
 
  • Prequisites: You need a local installation of TurboVNC
  • Log in to palma and call
    ml vis/vnc
Changed:
<
<
vnc.sh
>
>
vnc.sh
 
  • Wait until the session has started and follow the instructions of the script (ssh to the compute node and start your local TurboVNC)
  • Open a terminal in the VNC window and enter "module add intel Mesa" or "module add foss Mesa"
  • Start an application with GUI

Revision 162019-06-17 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 295 to 295
 
scontrol requeue <jobid>

Visualization

Changed:
<
<
For the visualization of bigger data sets, it is impractical to copy them to your local machine. We therefore offer a solution to do the postprocessing on Palma II. Since the CPUs are quite powerful, the rendering is done in software.
>
>
For the visualization of bigger data sets, it is impractical to copy them to your local machine. We therefore offer a solution to do the postprocessing on Palma II. Since the CPUs are quite fast, the rendering is done in software.
 
  • Prequisites: You need a local installation of TurboVNC
Changed:
<
<
  • Log in to palma and call /Applic.HPC/visualisierung/vnc.sh
  • Wait until the session has started and follow the instructions of the script (ssh to the compute node and start your local TurboVNC, currently only with instructions for Linux)
>
>
  • Log in to palma and call
    ml vis/vnc
    vnc.sh
    
  • Wait until the session has started and follow the instructions of the script (ssh to the compute node and start your local TurboVNC)
 
  • Open a terminal in the VNC window and enter "module add intel Mesa" or "module add foss Mesa"
  • Start an application with GUI

Revision 152019-06-17 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 294 to 294
 
scontrol requeue <jobid>
Changed:
<
<

Visualization

>
>

Visualization

For the visualization of bigger data sets, it is impractical to copy them to your local machine. We therefore offer a solution to do the postprocessing on Palma II. Since the CPUs are quite powerful, the rendering is done in software.

  • Prequisites: You need a local installation of TurboVNC
  • Log in to palma and call /Applic.HPC/visualisierung/vnc.sh
  • Wait until the session has started and follow the instructions of the script (ssh to the compute node and start your local TurboVNC, currently only with instructions for Linux)
  • Open a terminal in the VNC window and enter "module add intel Mesa" or "module add foss Mesa"
  • Start an application with GUI
  -- Holger Angenent - 2018-07-11 \ No newline at end of file

Revision 142019-06-13 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 53 to 53
 The batch system on PALMA II is SLURM. If you are used to PBS/Maui and want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf

The partitions

Changed:
<
<
  • normal: 434 nodes with 72 CPU threads and 92 respectively 192 GB RAM. The maximal run time is 7 days, but shorter jobs will be preferred. To be able to use the himem nodes (with 192 GB), you have to set the #SBATCH --mem parameter to a value higher than 92GB.
  • express: 2 nodes with 72 threads and 92 GB RAM. A partition for short running (test) jobs with a maximum walltime of 2 hours.
  • bigsmp: 6 nodes with 144 threads and 1,5 TB RAM
>
>
  • normal: 434 nodes with 72 CPU threads and 92 respectively 192 GB RAM. The maximal run time is 7 days. To be able to use the himem nodes (with 192 GB), you have to set the #SBATCH --mem parameter to a value higher than 92GB.
  • express: 5 nodes with 72 threads and 92 GB RAM (one of them with 192 GB). A partition for short running (test) jobs with a maximal walltime of 2 hours.
  • bigsmp: 3 nodes with 144 threads and 1,5 TB RAM
 
  • largesmp: 2 nodes with 144 threads and 3 TB RAM
Changed:
<
<
  • requeue: Job in this queue will run on the nodes of the exclusive nodes below. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.
>
>
  • requeue: Job in this queue will run on the nodes of the exclusive nodes below. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care. The maximal walltime is 24 hours. There are also 2 1,5 TB machines available in the requeue partition.
 
  • gpuk20: Four nodes with 3 nvidia K20 GPUs
  • gpuv100: One node with 4 nvidia V100 GPUs
  • gputitanxp: One node with 8 nvidia TitanXP GPUs

Revision 132019-03-07 - JuriHoesselbarth

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 208 to 208
 
  • You can use the batch system to reserve only some of the GPUs. Use Slurm's generic resources for this https://slurm.schedmd.com/gres.html You can for example write #SBATCH --gres=gpu:1 to get only one GPU. Reserve CPUs accordingly.
Using Caffe
Changed:
<
<
Caffe 1.0 is available for Python3 on the gpuv100 partition in the fosscuda/2018b toolchain. To use it, you have to load fosscuda/2018b and Caffe (ml fosscuda/2018b Caffe) and export the Caffe Pythonpath in your submit file.
--export=PYTHONPATH=/Applic.HPC/skylakegpu/software/MPI/GCC-CUDA/7.3.0-2.30-9.2.88/OpenMPI/3.1.1/Caffe/1.0-Python-3.6.6/python:$PYTHONPATH
>
>
Caffe 1.0 is available for Python3 on the GPU partitions in the fosscuda/2018b toolchain. To use it, you have to load fosscuda/2018b and Caffe (ml fosscuda/2018b Caffe) and export the Caffe PYTHONPATH.

On Skylake nodes (gputitanxp and gpuv100 partitions)

PYTHONPATH=/Applic.HPC/skylakegpu/software/MPI/GCC-CUDA/7.3.0-2.30-9.2.88/OpenMPI/3.1.1/Caffe/1.0-Python-3.6.6/python:$PYTHONPATH

On Broadwell nodes (gpuk20 partition)

PYTHONPATH=/Applic.HPC/k20gpu/software/MPI/GCC-CUDA/7.3.0-2.30-9.2.88/OpenMPI/3.1.1/Caffe/1.0-Python-3.6.6/python:$PYTHONPATH
 

Show information about the partitions

scontrol show partition

Revision 122019-02-13 - JuriHoesselbarth

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 110 to 113
 #SBATCH --mail-user=your_account@uni-muenster.de

# run the application

Changed:
<
<
./program
>
>
./program
  You can send your submission to the batch system with the command "sbatch submit.cmd"
Line: 204 to 206
 
  • Use one of the gpu... partitions (see above)
  • Start your jobs with #SBATCH --export=none This is because there are other modules on the GPU nodes.
  • You can use the batch system to reserve only some of the GPUs. Use Slurm's generic resources for this https://slurm.schedmd.com/gres.html You can for example write #SBATCH --gres=gpu:1 to get only one GPU. Reserve CPUs accordingly.
Added:
>
>
Using Caffe
 
Added:
>
>
Caffe 1.0 is available for Python3 on the gpuv100 partition in the fosscuda/2018b toolchain. To use it, you have to load fosscuda/2018b and Caffe (ml fosscuda/2018b Caffe) and export the Caffe Pythonpath in your submit file.
--export=PYTHONPATH=/Applic.HPC/skylakegpu/software/MPI/GCC-CUDA/7.3.0-2.30-9.2.88/OpenMPI/3.1.1/Caffe/1.0-Python-3.6.6/python:$PYTHONPATH
 

Show information about the partitions

scontrol show partition

Revision 112019-01-23 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 56 to 56
 
  • bigsmp: 6 nodes with 144 threads and 1,5 TB RAM
  • largesmp: 2 nodes with 144 threads and 3 TB RAM
  • requeue: Job in this queue will run on the nodes of the exclusive nodes below. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.
Added:
>
>
  • gpuk20: Four nodes with 3 nvidia K20 GPUs
  • gpuv100: One node with 4 nvidia V100 GPUs
  • gputitanxp: One node with 8 nvidia TitanXP GPUs
  There are some special partitions, which are only allowed for certain groups (these are also Skylake nodes like in the normal queue):
  • p0fuchs: 9 lowmen (96 GB) nodes
Line: 68 to 71
 
  • e0mi: 2 himem nodes
  • p0rohlfi: 7 lowmem and 8 himem nodes
Changed:
<
<
When using PBS skript, there are some differences to PALMA:
>
>
When using PBS skript, there are some differences to the old PALMA:
 
  • The first line of the submit script has to be #!/bin/bash
  • A queue is called partition in terms of SLURM. These terms will be used synonymous here.
  • The variable $PBS_O_WORKDIR will not be set. Instead you will start in the directory in which the script resides.
Line: 194 to 197
 
srun -p normal --nodes=2 --ntasks=72 --ntasks-per-node=36 --cpus-per-task=2 --pty bash
OMP_NUM_THREADS=2 mpirun ./program
Added:
>
>

Using the GPU nodes

If you want to use a GPU for your computations:

  • Use one of the gpu... partitions (see above)
  • Start your jobs with #SBATCH --export=none This is because there are other modules on the GPU nodes.
  • You can use the batch system to reserve only some of the GPUs. Use Slurm's generic resources for this https://slurm.schedmd.com/gres.html You can for example write #SBATCH --gres=gpu:1 to get only one GPU. Reserve CPUs accordingly.
 

Show information about the partitions

scontrol show partition

Revision 102019-01-10 - MartinLeweling

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 85 to 85
 # set the number of CPU cores per node #SBATCH --ntasks-per-node 72
Changed:
<
<
# How much memory is needed (per node) #SBATCH --mem=64GB
>
>
# How much memory is needed (per node). Possible units: K, G, M, T #SBATCH --mem=64G
  # set a partition #SBATCH --partition normal
Line: 128 to 128
 # set the number of CPU cores per node #SBATCH --exclusive
Changed:
<
<
# How much memory is needed (per node) #SBATCH --mem=64GB
>
>
# How much memory is needed (per node). Possible units: K, G, M, T. #SBATCH --mem=64G
  # set a partition #SBATCH --partition normal
Line: 165 to 165
  #SBATCH --ntasks-per-node=36
Changed:
<
<
# How much memory is needed (per node) #SBATCH --mem=64GB
>
>
# How much memory is needed (per node). Possible units: K, G, M, T. #SBATCH --mem=64G
  # set a partition #SBATCH --partition normal

Revision 92018-08-15 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 120 to 120
  mpirun will get all necessary information from SLURM, if submitted appropriately. If you for example want to start 144 MPI ranks distributed to two nodes, you could do this the following way:
Changed:
<
<
srun -p normal --nodes=2 --ntasks=144 --ntasks-per-node=72 --pty bash mpirun ./program
>
>
#!/bin/bash

# set the number of nodes #SBATCH --nodes=2

# set the number of CPU cores per node #SBATCH --exclusive

# How much memory is needed (per node) #SBATCH --mem=64GB

# set a partition #SBATCH --partition normal

# set max wallclock time #SBATCH --time=2-00:00:00

# set name of job #SBATCH --job-name=test123

# mail alert at start, end and abortion of execution #SBATCH --mail-type=ALL

# set an output file #SBATCH --output output.dat

# send mail to this address #SBATCH --mail-user=your_account@uni-muenster.de

# run the application mpirun program

Some codes do not profit from Hyperthreading, so it is better, to start only 36 processes per node:

#!/bin/bash

# set the number of nodes
#SBATCH --nodes=2

# set the number of CPU cores per node
#SBATCH --exclusive

#SBATCH --ntasks-per-node=36

# How much memory is needed (per node)
#SBATCH --mem=64GB

# set a partition
#SBATCH --partition normal

# set max wallclock time
#SBATCH --time=2-00:00:00

# set name of job
#SBATCH --job-name=test123

# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL

# set an output file
#SBATCH --output output.dat

# send mail to this address
#SBATCH --mail-user=your_account@uni-muenster.de

# run the application
mpirun program
 
Deleted:
<
<
or for an non-interactive run or put those parameters in the batch script.
  For starting hybrid jobs (meaning that they are using MPI and OpenMP parallelization at the same time), you can use the --cpus-per-task switch.
srun -p normal --nodes=2 --ntasks=72 --ntasks-per-node=36 --cpus-per-task=2 --pty bash

Revision 82018-08-10 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 42 to 42
 

Monitoring

Changed:
<
<
>
>
 
  • If you have X forwarding enabled, you can use sview (Just type "sview" at the command line).
  • pestat (A command line tool for monitoring the batch system)

Revision 72018-08-09 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 55 to 55
 
  • express: 2 nodes with 72 threads and 92 GB RAM. A partition for short running (test) jobs with a maximum walltime of 2 hours.
  • bigsmp: 6 nodes with 144 threads and 1,5 TB RAM
  • largesmp: 2 nodes with 144 threads and 3 TB RAM
Added:
>
>
  • requeue: Job in this queue will run on the nodes of the exclusive nodes below. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.
 
Changed:
<
<
There are some special queues, which are only allowed for certain groups (these are also Skylake nodes like in the normal queue): (not yet available)
  • p0fuchs: 8 nodes for exclusive usage
  • p0kulesz: 4 nodes for exclusive usage
  • p0klasen: 1 nodes for exclusive usage
  • p0kapp: 1 nodes for exclusive usage
  • hims: 4 nodes for exclusive usage
>
>
There are some special partitions, which are only allowed for certain groups (these are also Skylake nodes like in the normal queue):
  • p0fuchs: 9 lowmen (96 GB) nodes
  • p0kulesz: 6 lowmem and 3 himem (192 GB) nodes
  • p0klasen: 1 lowmem an 1 himem node
  • p0kapp: 1 lowmem node
  • hims: 25 lowmem and 38 himem nodes
  • d0ow: 1 lowmem node
  • q0heuer: 15 lowmem nodes
  • e0mi: 2 himem nodes
  • p0rohlfi: 7 lowmem and 8 himem nodes
  When using PBS skript, there are some differences to PALMA:
  • The first line of the submit script has to be #!/bin/bash

Revision 62018-08-06 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 7 to 7
  Palma II is the HPC system of the Zentrum für Informationsverarbeitung. To be able to log in, you have to register for the group u0clstr in MeinZIV. The login node is palma2c.uni-muenster.de at the moment. You can reach it via ssh (from Windows with putty for example)
Changed:
<
<

Software/The module concept

>
>

Filesystems

When you log in to the cluster for the first time, a directory in /home is created for you. Please use this only to store your programs, but don't store your numerical results there. We have limited your storage in home to 400GB. You have to create a directory in /scratch/tmp to store the data you create on the compute nodes there. To enforce this, we will mount home read only on the compute nodes in the future. And since /scratch is not intended as an archive you are asked to remove your data there as soon as you do not need them anymore.

Software/The module concept

  The software on palma-ng can be accessed via modules. These are small script that set environment variables (like PATH and LD_LIBRARY_PATH) pointing to the locations where the software is installed (this is mostly on network drives so that the software is available on every node in the cluster). The module system we use here is LMOD (1). In contrast to the older environment modules we used on PALMA I and NWZPHI, there is the new command "module spider". Please find more information on this below.
Line: 47 to 51
 The batch system on PALMA II is SLURM. If you are used to PBS/Maui and want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf

The partitions

Changed:
<
<
  • normal: 434 nodes with 72 CPU threads and 92 respectively 192 GB RAM. The maximal run time is 7 days, but shorter jobs will be preferred.
  • express: A partition for short running (test) jobs with a maximum walltime of 2 hours.
>
>
  • normal: 434 nodes with 72 CPU threads and 92 respectively 192 GB RAM. The maximal run time is 7 days, but shorter jobs will be preferred. To be able to use the himem nodes (with 192 GB), you have to set the #SBATCH --mem parameter to a value higher than 92GB.
  • express: 2 nodes with 72 threads and 92 GB RAM. A partition for short running (test) jobs with a maximum walltime of 2 hours.
 
  • bigsmp: 6 nodes with 144 threads and 1,5 TB RAM
  • largesmp: 2 nodes with 144 threads and 3 TB RAM
Line: 74 to 78
 #SBATCH --nodes=1

# set the number of CPU cores per node

Changed:
<
<
#SBATCH --ntasks-per-node 8
>
>
#SBATCH --ntasks-per-node 72

# How much memory is needed (per node) #SBATCH --mem=64GB

  # set a partition #SBATCH --partition normal
Line: 100 to 107
  You can send your submission to the batch system with the command "sbatch submit.cmd"
Added:
>
>
It is recommended to reserve complete nodes, if you can use 72 threads.
 A detailed description can be found here: http://slurm.schedmd.com/sbatch.html

Starting jobs with MPI-parallel codes

Changed:
<
<
mpirun will get all necessary information from SLURM, if submitted appropriately. If you for example want to start 128 MPI ranks distributed to four nodes, you could do this the following way:

srun -p normal --nodes=2 --ntasks=128 --ntasks-per-node=64 --pty bash
mpirun ./program
>
>
mpirun will get all necessary information from SLURM, if submitted appropriately. If you for example want to start 144 MPI ranks distributed to two nodes, you could do this the following way:
srun -p normal --nodes=2 --ntasks=144 --ntasks-per-node=72 --pty bash
mpirun ./program
  or for an non-interactive run or put those parameters in the batch script.

For starting hybrid jobs (meaning that they are using MPI and OpenMP parallelization at the same time), you can use the --cpus-per-task switch.

Changed:
<
<

srun -p normal --nodes=2 --ntasks=64 --ntasks-per-node=32 --cpus-per-task=2 --pty bash
OMP_NUM_THREADS=2 mpirun ./program

Using GPU resources

The k20gpu queue features 4 nodes with 3 K20 nVidia Tesla accelerators each. To use one of these the following option must be present in your batch script:


#SBATCH --gres=gpu:1

It is also possible to use more than one. Additionally it is also possible to specify the type of GPU you want to work on. At the moment there are the following types:

  • kepler: the standard type which can be used for most calculations
  • kepler_benchmark: an alternative type if you want to use the second and third GPU without the first
To specify a certain type use the following in your batch script:

#SBATCH --gres=gpu:kepler:1
>
>
srun -p normal --nodes=2 --ntasks=72 --ntasks-per-node=36 --cpus-per-task=2 --pty bash
OMP_NUM_THREADS=2 mpirun ./program
 

Show information about the partitions

scontrol show partition
Line: 161 to 161
 
sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed

Show priorities for waiting jobs:

Changed:
<
<

sprio
>
>

sprio -l
 

Controlling jobs

Revision 52018-08-06 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 44 to 44
 

The batch system

Changed:
<
<
The batch system on PALMA3 is SLURM, but there is a wrapper for PBS installed, so most of your skripts should still be able to work. If you want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf
>
>
The batch system on PALMA II is SLURM. If you are used to PBS/Maui and want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf
 

The partitions

Changed:
<
<
  • normal: 29 nodes with 32 Broadwell CPU cores (64 threads) each and 128 GB RAM.
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator
  • requeue: Job in this queue will run on the nodes of the above mentioned 18 nodes. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.
>
>
  • normal: 434 nodes with 72 CPU threads and 92 respectively 192 GB RAM. The maximal run time is 7 days, but shorter jobs will be preferred.
 
  • express: A partition for short running (test) jobs with a maximum walltime of 2 hours.
Changed:
<
<
There are some special queues, which are only allowed for certain groups (these are also Broadwell nodes like in the normal queue):
>
>
  • bigsmp: 6 nodes with 144 threads and 1,5 TB RAM
  • largesmp: 2 nodes with 144 threads and 3 TB RAM

There are some special queues, which are only allowed for certain groups (these are also Skylake nodes like in the normal queue): (not yet available)

 
  • p0fuchs: 8 nodes for exclusive usage
  • p0kulesz: 4 nodes for exclusive usage
  • p0klasen: 1 nodes for exclusive usage
Line: 62 to 63
 
  • The first line of the submit script has to be #!/bin/bash
  • A queue is called partition in terms of SLURM. These terms will be used synonymous here.
  • The variable $PBS_O_WORKDIR will not be set. Instead you will start in the directory in which the script resides.
Changed:
<
<
  • For using the "module add" command, you will have to source some scripts first: "source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh"
>
>
 

Submit a job

Create a file for example called submit.cmd

Line: 123 to 124
 To specify a certain type use the following in your batch script:

#SBATCH --gres=gpu:kepler:1
Changed:
<
<

Show information about the queues

>
>

Show information about the partitions

 
scontrol show partition

Show information about the nodes

Line: 134 to 135
 Use for example the following command:
srun --partition express --nodes 1 --ntasks-per-node=8 --pty bash
Changed:
<
<
This starts a job in the u0dawin queue/partition on one node with eight cores.
>
>
This starts a job in the express partition on one node with eight cores.
 

Information on jobs

Revision 42018-07-27 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Content

Overview

Changed:
<
<
>
>
Palma II is the HPC system of the Zentrum für Informationsverarbeitung. To be able to log in, you have to register for the group u0clstr in MeinZIV. The login node is palma2c.uni-muenster.de at the moment. You can reach it via ssh (from Windows with putty for example)
 

Software/The module concept

The software on palma-ng can be accessed via modules. These are small script that set environment variables (like PATH and LD_LIBRARY_PATH) pointing to the locations where the software is installed (this is mostly on network drives so that the software is available on every node in the cluster). The module system we use here is LMOD (1). In contrast to the older environment modules we used on PALMA I and NWZPHI, there is the new command "module spider". Please find more information on this below.

Revision 32018-07-26 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Line: 65 to 65
 

Submit a job

Create a file for example called submit.cmd

Changed:
<
<

#!/bin/bash

# set the number of nodes
#SBATCH --nodes=1

# set the number of CPU cores per node
#SBATCH --ntasks-per-node 8

# set a partition
#SBATCH --partition normal

# set max wallclock time
#SBATCH --time=24:00:00

# set name of job
#SBATCH --job-name=test123

# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL

# set an output file
#SBATCH --output output.dat

# send mail to this address
#SBATCH --mail-user=your_account@uni-muenster.de

# In the u0dawin queue, you will need the following line
source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh

# run the application
./program
>
>
#!/bin/bash

# set the number of nodes
#SBATCH --nodes=1

# set the number of CPU cores per node
#SBATCH --ntasks-per-node 8

# set a partition
#SBATCH --partition normal

# set max wallclock time
#SBATCH --time=24:00:00

# set name of job
#SBATCH --job-name=test123

# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL

# set an output file
#SBATCH --output output.dat

# send mail to this address
#SBATCH --mail-user=your_account@uni-muenster.de

# run the application
./program
  You can send your submission to the batch system with the command "sbatch submit.cmd"

Revision 22018-07-24 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA II

Changed:
<
<
The documentation has yet to be written. So far, have a look at palma-ng, since PALMA II is also using SLURM as batch system.
>
>
Content

Overview

Software/The module concept

The software on palma-ng can be accessed via modules. These are small script that set environment variables (like PATH and LD_LIBRARY_PATH) pointing to the locations where the software is installed (this is mostly on network drives so that the software is available on every node in the cluster). The module system we use here is LMOD (1). In contrast to the older environment modules we used on PALMA I and NWZPHI, there is the new command "module spider". Please find more information on this below.

The most important difference between Palma I and PALMA II is the [https://hpcugent.github.io/easybuild/files/hust14_paper.pdf][hierarchical module naming scheme]] (2)

(1) https://www.tacc.utexas.edu/research-development/tacc-projects/lmod

(2) https://hpcugent.github.io/easybuild/files/hust14_paper.pdf

Command (Short- and Long-form) Meaning
module av[ailable] Lists all currently available modules
module spider List all available modules with their description
module spider modulename Show the description of a module and give a hint, which modules have to be loaded to make it available.
module li[st] Lists all modules in the actual enviroment
module show modulname Lists all changes caused by a module
module add modul1 modul2 ... Adds module to the current environment
module rm modul1 modul2 ... Deletes module from the current environment
module purge Deletes all modules from current environment
Hierarchical module naming scheme means that you do not see all modules at the same time. You will have to load a toolchain or compiler first to see the software that has been compiled with those. At the moment there are the following toolchains:

  • foss/2018a GCC with OpenMPI
  • intel/2018a Intel Compiler with Intel MPI

If you want to use the Intel compiler, you can type for example the following:


module add intel/2018a
module av

and you will see the software that has been compiled with this version. Alternatively you can use the "module spider" command.

Monitoring

  • Ganglia
  • If you have X forwarding enabled, you can use sview (Just type "sview" at the command line).
  • pestat (A command line tool for monitoring the batch system)

The batch system

The batch system on PALMA3 is SLURM, but there is a wrapper for PBS installed, so most of your skripts should still be able to work. If you want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf

The partitions

  • normal: 29 nodes with 32 Broadwell CPU cores (64 threads) each and 128 GB RAM.
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator
  • requeue: Job in this queue will run on the nodes of the above mentioned 18 nodes. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.
  • express: A partition for short running (test) jobs with a maximum walltime of 2 hours.
There are some special queues, which are only allowed for certain groups (these are also Broadwell nodes like in the normal queue):
  • p0fuchs: 8 nodes for exclusive usage
  • p0kulesz: 4 nodes for exclusive usage
  • p0klasen: 1 nodes for exclusive usage
  • p0kapp: 1 nodes for exclusive usage
  • hims: 4 nodes for exclusive usage

When using PBS skript, there are some differences to PALMA:

  • The first line of the submit script has to be #!/bin/bash
  • A queue is called partition in terms of SLURM. These terms will be used synonymous here.
  • The variable $PBS_O_WORKDIR will not be set. Instead you will start in the directory in which the script resides.
  • For using the "module add" command, you will have to source some scripts first: "source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh"

Submit a job

Create a file for example called submit.cmd


#!/bin/bash

# set the number of nodes
#SBATCH --nodes=1

# set the number of CPU cores per node
#SBATCH --ntasks-per-node 8

# set a partition
#SBATCH --partition normal

# set max wallclock time
#SBATCH --time=24:00:00

# set name of job
#SBATCH --job-name=test123

# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL

# set an output file
#SBATCH --output output.dat

# send mail to this address
#SBATCH --mail-user=your_account@uni-muenster.de

# In the u0dawin queue, you will need the following line
source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh

# run the application
./program

You can send your submission to the batch system with the command "sbatch submit.cmd"

A detailed description can be found here: http://slurm.schedmd.com/sbatch.html

Starting jobs with MPI-parallel codes

mpirun will get all necessary information from SLURM, if submitted appropriately. If you for example want to start 128 MPI ranks distributed to four nodes, you could do this the following way:


srun -p normal --nodes=2 --ntasks=128 --ntasks-per-node=64 --pty bash
mpirun ./program

or for an non-interactive run or put those parameters in the batch script.

For starting hybrid jobs (meaning that they are using MPI and OpenMP parallelization at the same time), you can use the --cpus-per-task switch.


srun -p normal --nodes=2 --ntasks=64 --ntasks-per-node=32 --cpus-per-task=2 --pty bash
OMP_NUM_THREADS=2 mpirun ./program

Using GPU resources

The k20gpu queue features 4 nodes with 3 K20 nVidia Tesla accelerators each. To use one of these the following option must be present in your batch script:


#SBATCH --gres=gpu:1

It is also possible to use more than one. Additionally it is also possible to specify the type of GPU you want to work on. At the moment there are the following types:

  • kepler: the standard type which can be used for most calculations
  • kepler_benchmark: an alternative type if you want to use the second and third GPU without the first
To specify a certain type use the following in your batch script:

#SBATCH --gres=gpu:kepler:1

Show information about the queues

scontrol show partition

Show information about the nodes

sinfo

Running interactive jobs with SLURM

Use for example the following command:

srun --partition express --nodes 1 --ntasks-per-node=8 --pty bash

This starts a job in the u0dawin queue/partition on one node with eight cores.

Information on jobs

List all current jobs for a user:

squeue -u <username>

List all running jobs for a user:

squeue -u <username> -t RUNNING

List all pending jobs for a user:

squeue -u <username> -t PENDING

List all current jobs in the normal partition for a user:

squeue -u <username> -p normal

List detailed information for a job (useful for troubleshooting):

scontrol show job -dd <jobid>

Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.

To get statistics on completed jobs by jobID:

sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed

To view the same information for all jobs of a user:

sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed

Show priorities for waiting jobs:


sprio

Controlling jobs

To cancel one job:

scancel <jobid>

To cancel all the jobs for a user:

scancel -u <username>

To cancel all the pending jobs for a user:

scancel -t PENDING -u <username>

To cancel one or more jobs by name:

scancel --name myJobName

To pause a particular job:

scontrol hold <jobid>

To resume a particular job:

scontrol resume <jobid>

To requeue (cancel and rerun) a particular job:

scontrol requeue <jobid>

Visualization

  -- Holger Angenent - 2018-07-11

Revision 12018-07-11 - HolgerAngenent

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="HPC"

PALMA II

The documentation has yet to be written. So far, have a look at palma-ng, since PALMA II is also using SLURM as batch system.

-- Holger Angenent - 2018-07-11

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding ZIVwiki? Send feedback
Datenschutzerklärung Impressum