Difference: PALMA3 (1 vs. 28)

Revision 282018-04-17 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 6 to 6
 

Overview

palma3 is the login node to a newer part of the PALMA system. It has various queues/partitions for different purposes:

Deleted:
<
<
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
 
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
  • normal: 29 nodes with 32 Broadwell CPU cores (64 threads) each and 128 GB RAM.
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator

Revision 272018-03-26 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 93 to 93
 #SBATCH --ntasks-per-node 8

# set a partition

Changed:
<
<
#SBATCH --partition u0dawin
>
>
#SBATCH --partition normal
  # set max wallclock time #SBATCH --time=24:00:00
Line: 159 to 159
  Use for example the following command:
Changed:
<
<
srun --partition u0dawin --nodes 1 --ntasks-per-node=8 --pty bash
>
>
srun --partition express --nodes 1 --ntasks-per-node=8 --pty bash
  This starts a job in the u0dawin queue/partition on one node with eight cores.

Revision 262017-12-22 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 9 to 9
 
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
  • normal: 29 nodes with 32 Broadwell CPU cores (64 threads) each and 128 GB RAM.
Deleted:
<
<
  • smp: An AMD Opteron machine with 64 cores and 512 GB RAM (The former ZIVSMP)
  • phi: Two nodes with 4 Intel Xeon Phi Knights Corner accelerators each. (not available yet).
 
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator
  • requeue: Job in this queue will run on the nodes of the above mentioned 18 nodes. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.
  • express: A partition for short running (test) jobs with a maximum walltime of 2 hours.

Revision 252017-12-13 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 13 to 13
 
  • phi: Two nodes with 4 Intel Xeon Phi Knights Corner accelerators each. (not available yet).
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator
  • requeue: Job in this queue will run on the nodes of the above mentioned 18 nodes. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.
Added:
>
>
  • express: A partition for short running (test) jobs with a maximum walltime of 2 hours.
  There are some special queues, which are only allowed for certain groups (these are also Broadwell nodes like in the normal queue):
  • p0fuchs: 8 nodes for exclusive usage

Revision 242017-07-10 - JulianBigge

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 184 to 184
  List detailed information for a job (useful for troubleshooting):
Changed:
<
<
scontrol show jobid -dd <jobid>
>
>
scontrol show job -dd <jobid>
  Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.

To get statistics on completed jobs by jobID:

Revision 232017-07-03 - JulianBigge

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 135 to 135
 srun -p normal --nodes=2 --ntasks=64 --ntasks-per-node=32 --cpus-per-task=2 --pty bash OMP_NUM_THREADS=2 mpirun ./program
Added:
>
>

Using GPU resources

The k20gpu queue features 4 nodes with 3 K20 nVidia Tesla accelerators each. To use one of these the following option must be present in your batch script:
#SBATCH --gres=gpu:1
It is also possible to use more than one. Additionally it is also possible to specify the type of GPU you want to work on. At the moment there are the following types:

  • kepler: the standard type which can be used for most calculations
  • kepler_benchmark: an alternative type if you want to use the second and third GPU without the first

To specify a certain type use the following in your batch script:

#SBATCH --gres=gpu:kepler:1
 

Show information about the queues

scontrol show partition

Revision 222017-05-04 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 91 to 91
 #SBATCH --nodes=1

# set the number of CPU cores per node

Changed:
<
<
#SBATCH --ntasks 8
>
>
#SBATCH --ntasks-per-node 8
  # set a partition #SBATCH --partition u0dawin

Revision 212017-03-08 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 8 to 8
 palma3 is the login node to a newer part of the PALMA system. It has various queues/partitions for different purposes:
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
Changed:
<
<
  • normal: 29 nodes with 32 Broadwell CPU cores each and 128 GB RAM.
>
>
  • normal: 29 nodes with 32 Broadwell CPU cores (64 threads) each and 128 GB RAM.
 
  • smp: An AMD Opteron machine with 64 cores and 512 GB RAM (The former ZIVSMP)
  • phi: Two nodes with 4 Intel Xeon Phi Knights Corner accelerators each. (not available yet).
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator

Revision 202017-03-02 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 21 to 21
 
  • p0kapp: 1 nodes for exclusive usage
  • hims: 4 nodes for exclusive usage
Added:
>
>
 

Software/The module concept

The software on palma-ng can be accessed via modules. These are small script that set environment variables (like PATH and LD_LIBRARY_PATH) pointing to the locations where the software is installed (this is mostly on network drives so that the software is available on every node in the cluster). The module system we use here is LMOD (1). In contrast to the older environment modules we used on PALMA I and NWZPHI, there is the new command "module spider". Please find more information on this below.

Line: 48 to 48
 
  • goolfc/2016.10 Only on the k20gpu nodes for CUDA

If you want to use the Intel compiler, you can type for example the following:

Deleted:
<
<
 
module add intel/2016b
Changed:
<
<
module av
>
>
module av
  and you will see the software that has been compiled with this version. Alternatively you can use the "module spider" command.
Line: 126 to 124
 

Starting jobs with MPI-parallel codes

mpirun will get all necessary information from SLURM, if submitted appropriately. If you for example want to start 128 MPI ranks distributed to four nodes, you could do this the following way:

Deleted:
<
<
 
srun -p normal --nodes=2 --ntasks=128 --ntasks-per-node=64 --pty bash
Changed:
<
<
mpirun ./program
>
>
mpirun ./program
  or for an non-interactive run or put those parameters in the batch script.
Line: 135 to 131
 or for an non-interactive run or put those parameters in the batch script.

For starting hybrid jobs (meaning that they are using MPI and OpenMP parallelization at the same time), you can use the --cpus-per-task switch.

Deleted:
<
<
 
srun -p normal --nodes=2 --ntasks=64 --ntasks-per-node=32 --cpus-per-task=2 --pty bash
Changed:
<
<
OMP_NUM_THREADS=2 mpirun ./program
>
>
OMP_NUM_THREADS=2 mpirun ./program
 

Show information about the queues

scontrol show partition
Line: 205 to 215
 
scontrol requeue <jobid>

-- Holger Angenent - 2016-08-22

Added:
>
>
 
META FILEATTACHMENT attachment="llview.png" attr="" comment="" date="1481027490" name="llview.png" path="llview.png" size="50475" user="h_5fzimm01" version="2"
Added:
>
>
META FILEATTACHMENT attachment="palma-ng_batchsystem.png" attr="" comment="" date="1488459211" name="palma-ng_batchsystem.png" path="palma-ng_batchsystem.png" size="24143" user="h_5fzimm01" version="2"

Revision 192017-03-02 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 9 to 9
 
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
  • normal: 29 nodes with 32 Broadwell CPU cores each and 128 GB RAM.
Changed:
<
<
  • zivsmp: An AMD Opteron machine with 64 cores and 512 GB RAM (The former ZIVSMP)
>
>
  • smp: An AMD Opteron machine with 64 cores and 512 GB RAM (The former ZIVSMP)
 
  • phi: Two nodes with 4 Intel Xeon Phi Knights Corner accelerators each. (not available yet).
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator
  • requeue: Job in this queue will run on the nodes of the above mentioned 18 nodes. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.

Revision 182017-03-01 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 27 to 27
  The most important difference between Palma I and palma-ng is the new introduced hierarchical module naming scheme (2)
Added:
>
>
(1) https://www.tacc.utexas.edu/research-development/tacc-projects/lmod

(2) https://hpcugent.github.io/easybuild/files/hust14_paper.pdf

 
Command (Short- and Long-form) Meaning
module av[ailable] Lists all currently available modules
module spider List all available modules with their description
Line: 37 to 41
 
module rm modul1 modul2 ... Deletes module from the current environment
module purge Deletes all modules from current environment
Changed:
<
<
(1) https://www.tacc.utexas.edu/research-development/tacc-projects/lmod
>
>
Hierarchical module naming scheme means that you do not see all modules at the same time. You will have to load a toolchain or compiler first to see the software that has been compiled with those. At the moment there are the following toolchains:
 
Changed:
<
<
(2) https://hpcugent.github.io/easybuild/files/hust14_paper.pdf
>
>
  • foss/2016b and foss/2017a GCC with OpenMPI
  • intel/2016b and intel/2017a Intel Compiler with Intel MPI
  • goolfc/2016.10 Only on the k20gpu nodes for CUDA

If you want to use the Intel compiler, you can type for example the following:

module add intel/2016b
module av

and you will see the software that has been compiled with this version. Alternatively you can use the "module spider" command.

 

Using the module command in submit scripts

Revision 172017-03-01 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 21 to 21
 
  • p0kapp: 1 nodes for exclusive usage
  • hims: 4 nodes for exclusive usage
Changed:
<
<

The module concept

>
>

Software/The module concept

 
Changed:
<
<
Environment variables (like PATH, LD_LIBRARY_PATH) for compilers and libraries can be set by modules:
>
>
The software on palma-ng can be accessed via modules. These are small script that set environment variables (like PATH and LD_LIBRARY_PATH) pointing to the locations where the software is installed (this is mostly on network drives so that the software is available on every node in the cluster). The module system we use here is LMOD (1). In contrast to the older environment modules we used on PALMA I and NWZPHI, there is the new command "module spider". Please find more information on this below.

The most important difference between Palma I and palma-ng is the new introduced hierarchical module naming scheme (2)

 
Command (Short- and Long-form) Meaning
Changed:
<
<
module av[ailable] Lists all available modules
>
>
module av[ailable] Lists all currently available modules
module spider List all available modules with their description
module spider modulename Show the description of a module and give a hint, which modules have to be loaded to make it available.
 
module li[st] Lists all modules in the actual enviroment
Changed:
<
<
module show modulname Lists all changes caused by a module
>
>
module show modulname Lists all changes caused by a module
 
module add modul1 modul2 ... Adds module to the current environment
module rm modul1 modul2 ... Deletes module from the current environment
Changed:
<
<
module purge Deletes all modules from czrrent environment
Several environment variables will be set by the modules.
>
>
module purge Deletes all modules from current environment

(1) https://www.tacc.utexas.edu/research-development/tacc-projects/lmod

(2) https://hpcugent.github.io/easybuild/files/hust14_paper.pdf

 
Deleted:
<
<
When you log in to palma3, some modules are loaded automatically.
 

Using the module command in submit scripts

This is only valid for the u0dawin queue

Revision 162017-02-24 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 75 to 75
 #SBATCH --ntasks 8

# set a partition

Changed:
<
<
#SBATCH -p u0dawin
>
>
#SBATCH --partition u0dawin
  # set max wallclock time #SBATCH --time=24:00:00
Line: 87 to 87
 #SBATCH --mail-type=ALL

# set an output file

Changed:
<
<
#SBATCH -o output.dat
>
>
#SBATCH --output output.dat
  # send mail to this address #SBATCH --mail-user=your_account@uni-muenster.de
Line: 102 to 102
  A detailed description can be found here: http://slurm.schedmd.com/sbatch.html
Added:
>
>

Starting jobs with MPI-parallel codes

mpirun will get all necessary information from SLURM, if submitted appropriately. If you for example want to start 128 MPI ranks distributed to four nodes, you could do this the following way:

srun -p normal --nodes=2 --ntasks=128 --ntasks-per-node=64 --pty bash
mpirun ./program

or for an non-interactive run or put those parameters in the batch script.

For starting hybrid jobs (meaning that they are using MPI and OpenMP parallelization at the same time), you can use the --cpus-per-task switch.

srun -p normal --nodes=2 --ntasks=64 --ntasks-per-node=32 --cpus-per-task=2 --pty bash
OMP_NUM_THREADS=2 mpirun ./program
 

Show information about the queues

scontrol show partition
Line: 111 to 129
 

Running interactive jobs with SLURM

Use for example the following command:

Changed:
<
<
srun -p u0dawin --nodes 1 --ntasks-per-node=8 --pty bash
>
>
srun --partition u0dawin --nodes 1 --ntasks-per-node=8 --pty bash
  This starts a job in the u0dawin queue/partition on one node with eight cores.

Revision 152017-02-24 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 111 to 111
 

Running interactive jobs with SLURM

Use for example the following command:

Changed:
<
<
srun -p u0dawin -N 1 --ntasks-per-node=8 --pty bash
>
>
srun -p u0dawin --nodes 1 --ntasks-per-node=8 --pty bash
  This starts a job in the u0dawin queue/partition on one node with eight cores.
Added:
>
>
 

Information on jobs

List all current jobs for a user:

Revision 142017-02-24 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 72 to 72
 #SBATCH --nodes=1

# set the number of CPU cores per node

Changed:
<
<
#SBATCH -n 8
>
>
#SBATCH --ntasks 8
  # set a partition #SBATCH -p u0dawin

Revision 132017-02-16 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 9 to 9
 
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
  • normal: 29 nodes with 32 Broadwell CPU cores each and 128 GB RAM.
Changed:
<
<
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (not available yet)
>
>
  • zivsmp: An AMD Opteron machine with 64 cores and 512 GB RAM (The former ZIVSMP)
 
  • phi: Two nodes with 4 Intel Xeon Phi Knights Corner accelerators each. (not available yet).
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator
Changed:
<
<
  • requeue: Job in this queue will run on the nodes of the above mentioned nodes. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.
>
>
  • requeue: Job in this queue will run on the nodes of the above mentioned 18 nodes. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.
  There are some special queues, which are only allowed for certain groups (these are also Broadwell nodes like in the normal queue):
  • p0fuchs: 8 nodes for exclusive usage

Revision 122017-02-15 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 37 to 37
 When you log in to palma3, some modules are loaded automatically.

Using the module command in submit scripts

Added:
>
>
This is only valid for the u0dawin queue
 If you want to use the module command in submit scripts, the line
source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh

Revision 112017-02-13 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 8 to 8
 palma3 is the login node to a newer part of the PALMA system. It has various queues/partitions for different purposes:
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
Changed:
<
<
  • normal: 44 nodes with 32 Broadwell CPU cores each. (Not fully installed yet)
>
>
  • normal: 29 nodes with 32 Broadwell CPU cores each and 128 GB RAM.
 
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (not available yet)
  • phi: Two nodes with 4 Intel Xeon Phi Knights Corner accelerators each. (not available yet).
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator
Added:
>
>
  • requeue: Job in this queue will run on the nodes of the above mentioned nodes. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care.

There are some special queues, which are only allowed for certain groups (these are also Broadwell nodes like in the normal queue):

  • p0fuchs: 8 nodes for exclusive usage
  • p0kulesz: 4 nodes for exclusive usage
  • p0klasen: 1 nodes for exclusive usage
  • p0kapp: 1 nodes for exclusive usage
  • hims: 4 nodes for exclusive usage
 

The module concept

Revision 102016-12-06 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 10 to 10
 
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
  • normal: 44 nodes with 32 Broadwell CPU cores each. (Not fully installed yet)
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (not available yet)
Changed:
<
<
  • phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (not available yet).
>
>
  • phi: Two nodes with 4 Intel Xeon Phi Knights Corner accelerators each. (not available yet).
  • knl: Four nodes with a Xeon Phi Knights Landing accelerator
 

The module concept

Line: 38 to 39
  Ganglia
Added:
>
>
If you have X forwarding enabled, you can use llview (Just type "llview" at the command line).

llview.png

 

The batch system

The batch system on PALMA3 is SLURM, but there is a wrapper for PBS installed, so most of your skripts should still be able to work. If you want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf

Line: 124 to 128
  Show priorities for waiting jobs:
Changed:
<
<
sprio
>
>
sprio
 

Controlling jobs

Line: 151 to 154
 
scontrol requeue <jobid>

-- Holger Angenent - 2016-08-22

Added:
>
>
META FILEATTACHMENT attachment="llview.png" attr="" comment="" date="1481027490" name="llview.png" path="llview.png" size="50475" user="h_5fzimm01" version="2"

Revision 92016-11-28 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 12 to 12
 
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (not available yet)
  • phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (not available yet).
Added:
>
>

The module concept

Environment variables (like PATH, LD_LIBRARY_PATH) for compilers and libraries can be set by modules:

Command (Short- and Long-form) Meaning
module av[ailable] Lists all available modules
module li[st] Lists all modules in the actual enviroment
module show modulname Lists all changes caused by a module
module add modul1 modul2 ... Adds module to the current environment
module rm modul1 modul2 ... Deletes module from the current environment
module purge Deletes all modules from czrrent environment
Several environment variables will be set by the modules.

When you log in to palma3, some modules are loaded automatically.

Using the module command in submit scripts

If you want to use the module command in submit scripts, the line

source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh

has to be added before. Otherwise, just put the "module add" commands in your .bashrc (which can be found in your home-directory).

 

Monitoring

Ganglia

Revision 82016-11-25 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 12 to 12
 
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (not available yet)
  • phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (not available yet).
Added:
>
>

Monitoring

Ganglia

 

The batch system

The batch system on PALMA3 is SLURM, but there is a wrapper for PBS installed, so most of your skripts should still be able to work. If you want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf

Line: 61 to 65
 You can send your submission to the batch system with the command "sbatch submit.cmd"

A detailed description can be found here: http://slurm.schedmd.com/sbatch.html

Deleted:
<
<

Show running jobs

  • squeue
  • qstat
  • showq
 

Show information about the queues

scontrol show partition
Added:
>
>

Show information about the nodes

sinfo
 

Running interactive jobs with SLURM

Use for example the following command:

srun -p u0dawin -N 1 --ntasks-per-node=8 --pty bash

This starts a job in the u0dawin queue/partition on one node with eight cores.

Added:
>
>

Information on jobs

List all current jobs for a user:

squeue -u <username>

List all running jobs for a user:

squeue -u <username> -t RUNNING

List all pending jobs for a user:

squeue -u <username> -t PENDING

List all current jobs in the normal partition for a user:

squeue -u <username> -p normal

List detailed information for a job (useful for troubleshooting):

scontrol show jobid -dd <jobid>

Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.

To get statistics on completed jobs by jobID:

sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed

To view the same information for all jobs of a user:

sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed

Show priorities for waiting jobs:

sprio

Controlling jobs

To cancel one job:

scancel <jobid>

To cancel all the jobs for a user:

scancel -u <username>

To cancel all the pending jobs for a user:

scancel -t PENDING -u <username>

To cancel one or more jobs by name:

scancel --name myJobName

To pause a particular job:

scontrol hold <jobid>

To resume a particular job:

scontrol resume <jobid>

To requeue (cancel and rerun) a particular job:

scontrol requeue <jobid>
  -- Holger Angenent - 2016-08-22 \ No newline at end of file

Revision 72016-09-07 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA-NG

Line: 52 to 52
 # send mail to this address #SBATCH --mail-user=your_account@uni-muenster.de
Added:
>
>
# In the u0dawin queue, you will need the following line source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh
 # run the application ./program

Revision 62016-09-06 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"
Changed:
<
<

PALMA3

>
>

PALMA-NG

 

Overview

Line: 52 to 53
 #SBATCH --mail-user=your_account@uni-muenster.de

# run the application

Changed:
<
<
./program
>
>
./program
  You can send your submission to the batch system with the command "sbatch submit.cmd"

Revision 52016-09-05 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA3

Line: 9 to 9
 
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
  • normal: 44 nodes with 32 Broadwell CPU cores each. (Not fully installed yet)
Changed:
<
<
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (At the moment still part of ZIVHPC.)
  • phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (A part of ZIVHPC at the moment).
>
>
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (not available yet)
  • phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (not available yet).
 

The batch system

The batch system on PALMA3 is SLURM, but there is a wrapper for PBS installed, so most of your skripts should still be able to work. If you want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf

Line: 22 to 23
 
  • For using the "module add" command, you will have to source some scripts first: "source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh"

Submit a job

Added:
>
>
Create a file for example called submit.cmd
#!/bin/bash

# set the number of nodes
#SBATCH --nodes=1

# set the number of CPU cores per node
#SBATCH -n 8

# set a partition
#SBATCH -p u0dawin

# set max wallclock time
#SBATCH --time=24:00:00

# set name of job
#SBATCH --job-name=test123

# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL

# set an output file
#SBATCH -o output.dat

# send mail to this address
#SBATCH --mail-user=your_account@uni-muenster.de

# run the application
./program

You can send your submission to the batch system with the command "sbatch submit.cmd"

A detailed description can be found here: http://slurm.schedmd.com/sbatch.html

 

Show running jobs

  • squeue
  • qstat

Revision 42016-09-05 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA3

Changed:
<
<
palma3 is the login node to a newer part of the PALMA system. It has various queues for different purposes:
>
>

Overview

palma3 is the login node to a newer part of the PALMA system. It has various queues/partitions for different purposes:

 
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
  • normal: 44 nodes with 32 Broadwell CPU cores each. (Not fully installed yet)
Line: 14 to 17
  When using PBS skript, there are some differences to PALMA:
  • The first line of the submit script has to be #!/bin/bash
Added:
>
>
  • A queue is called partition in terms of SLURM. These terms will be used synonymous here.
 
  • The variable $PBS_O_WORKDIR will not be set. Instead you will start in the directory in which the script resides.
  • For using the "module add" command, you will have to source some scripts first: "source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh"
Added:
>
>

Submit a job

Show running jobs

  • squeue
  • qstat
  • showq

Show information about the queues

scontrol show partition
 

Running interactive jobs with SLURM

Use for example the following command:

Deleted:
<
<
srun -p u0dawin -N 1 --ntasks-per-node=8 --pty bash
 
Changed:
<
<
-- Holger Angenent - 2016-08-22
>
>
srun -p u0dawin -N 1 --ntasks-per-node=8 --pty bash
 
Changed:
<
<

Comments

>
>
This starts a job in the u0dawin queue/partition on one node with eight cores.
 
Changed:
<
<
<--/commentPlugin-->
>
>
-- Holger Angenent - 2016-08-22

Revision 32016-09-02 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA3

Line: 8 to 8
 
  • normal: 44 nodes with 32 Broadwell CPU cores each. (Not fully installed yet)
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (At the moment still part of ZIVHPC.)
  • phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (A part of ZIVHPC at the moment).
Added:
>
>

The batch system

The batch system on PALMA3 is SLURM, but there is a wrapper for PBS installed, so most of your skripts should still be able to work. If you want to switch to SLURM, this document might help you: https://slurm.schedmd.com/rosetta.pdf

When using PBS skript, there are some differences to PALMA:

  • The first line of the submit script has to be #!/bin/bash
  • The variable $PBS_O_WORKDIR will not be set. Instead you will start in the directory in which the script resides.
  • For using the "module add" command, you will have to source some scripts first: "source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh"

Running interactive jobs with SLURM

Use for example the following command:

srun -p u0dawin -N 1 --ntasks-per-node=8 --pty bash
 -- Holger Angenent - 2016-08-22

Comments

Revision 22016-08-23 - HolgerAngenent

Line: 1 to 1
 
META TOPICPARENT name="HPC"

PALMA3

Changed:
<
<
palma3 is the login node to a newer part of the PALMA system. It has various queues for different purposes:
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
>
>
palma3 is the login node to a newer part of the PALMA system. It has various queues for different purposes:
  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
 
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
  • normal: 44 nodes with 32 Broadwell CPU cores each. (Not fully installed yet)
Changed:
<
<
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (At the moment still part of ZIVHPC.)
  • phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (A part of ZIVHPC at the moment).
>
>
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (At the moment still part of ZIVHPC.)
  • phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (A part of ZIVHPC at the moment).
 -- Holger Angenent - 2016-08-22

Comments

Revision 12016-08-22 - HolgerAngenent

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="HPC"

PALMA3

palma3 is the login node to a newer part of the PALMA system. It has various queues for different purposes:

  • u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for PALMA. It replaces the old ZIVHPC cluster
  • k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each
  • normal: 44 nodes with 32 Broadwell CPU cores each. (Not fully installed yet)
  • zivsmp: A SMP machine with 512 GB RAM. The old login node of ZIVHPC. (At the moment still part of ZIVHPC.)
  • phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (A part of ZIVHPC at the moment).
-- Holger Angenent - 2016-08-22

Comments

<--/commentPlugin-->
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding ZIVwiki? Send feedback
Datenschutzerklärung Impressum