NWZPHI < Anleitungen

Tags: view all tags
---+ NWZPHI the cluster of the IVV 4

NWZPHI is a cluster equipped with 98 Xeon Phi cards. These are PCIe based accelerators similar to GPUs, but can be used with regular programming languages.

Update: *[[phicusCentos7][New Centos 7 Installation]]*

%TOC{title="Content of this page"}%

---++ Hard- and Software overview

   * 2 developing and debugging servers (24 CPU cores with 2.4 GHz, 64 GB RAM, 1 Xeon Phi 5110p)
   * 12 accelerator nodes (24 cores with 2.4 GHz, 128 GB RAM, 8 Xeon Phi 5110p)
   * 1 SMP node (32 CPU cores, 1.5 TB RAM)
   * 88 TB storage (with !FhGFS) for home and scratch
   * FDR Infinibad as interconnect
   * The operating system is !RedHat Enterprise Linux 6

---++ NWZPHI for the impatient reader

The name of the login-server is NWZPHI. Allowed are all users that are members of the group u0clustr and at least one of the groups starting with p0, q0 or r0. In addition, every user allowed for PALMA may use NWZPHI. You can register yourself for u0clstr at [[http://www.uni-muenster.de/MeinZIV/][MeinZIV]] (go to “Username (account) and group memberships” / „Nutzerkennung und Gruppenmitgliedschaften“).

The batch and module system are working very similar to PALMA.

---+++ Differences to PALMA

If you are familiar with PALMA, starting jobs on NWZPHI is quite easy. There are some differences mentioned here

   * In the submit file, you do not need the switch "-A"
   * One node has 24 CPU cores
   * The node names and properties are different
   * The operating system has another version, so you have to recompile your code
   * To use the Xeon Phi accelerators, more work is necessary (see below)

---++ Starting jobs on NWZPHI

   * Choose your software environment and (optionally) compile your code
   * Submit your job via the batch system

---+++ Environment Modules

Environment variables (like PATH, LD_LIBRARY_PATH) for compilers and libraries can be set by modules:

| *Command (Short- and Long-form)* | *Meaning* |
| module av[ailable] | Lists all available modules |
| module li[st] | Lists all modules in the actual enviroment |
| module show modulname | Lists all changes caused by a module |
| module add modul1 modul2 ... | Adds module to the actual environment |
| module rm modul1 modul2 ... | Deletes module from the actual environment |
| module purge | Deletes all modules from actual environment |
To use the same modules at every login, put the commands in your $HOME/.bashrc. The recommended default module is

<verbatim>
module add intel/2016a
</verbatim>
This is a toolchain that loads other modules like (Intel-) MPI and the MKL.

---++ Batch system

The batch system Torque and the scheduler Moab are used to submit jobs. It is not allowed, to start jobs manually. Batch jobs should only be submitted from the server mn02.

---+++ Creating submit-files

Example of a submit-file of a MPI-job:

<verbatim>#PBS -o output.dat
#PBS -l walltime=01:00:00,nodes=2:ppn=24
#PBS -M username@uni-muenster.de
#PBS -m ae
#PBS -q default
#PBS -N job_name
#PBS -j oe
cd $PBS_O_WORKDIR
mpdboot  -n 2 -f $PBS_NODEFILE  -v
mpirun -machinefile $PBS_NODEFILE -np 48 ./executable</verbatim>

An MPI-job with 48 processes is started.

Further Information:
   * *username*: Replace by own username
   * *job_directory*: Replace by the path, where the executable can be found
   * *executable*: Enter the name of the executable
   * *walltime*: The time needed for a whole run. At the moment, maximal 48 hours are possible

When no MPI is needed, the submit-file can be simpler.

Example for a job using openMP:

<verbatim>#PBS -o output.dat
#PBS -l walltime=01:00:00,nodes=1:ppn=24
#PBS -M username@uni-muenster.de
#PBS -m ae
#PBS -q default
#PBS -N job_name
#PBS -j oe
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=12
./executable</verbatim>

---+++ Choosing the nodes

The cluster consists of the following nodes:

| *Name* | *Hardware* | *Queue* |  *Annotations* | *Max Walltime* |
| sl250-01, sl250-02 | 24 cores, 64 GB RAM, 1 Xeon Phi accelerator | debug | Debugging node, short maximal walltime, so you have less waiting time | 4 hours |
| sl270-01-12 | 24 cores, 128 GB RAM, 8 Xeon Phi accelerators | default | Production nodes | 48 hours |
| dl560-01 | 32 cores, 1,5 TB RAM | bigsmp | For large !OpenMP computations with very high memory demands | 200 hours |
| sl230-01-03 | 24 cores, 64 GB RAM | p0doltsi | Reserved for the Doltsinis group | 160 hours |
To choose the node type that you want to use, you have to use the correct queue. So if you want to run a large computation which needs more than 128 GB of RAM for a single process, the dl560 is right for you. In this case, you have to use the bigsmp queue:

<verbatim>
#PBS -q bigsmp</verbatim>

---+++ Submitting jobs / Managing the queue

A job is submitted by entering the command

<verbatim> qsub submit.cmd </verbatim>

, where _submit.cmd_ is the name of the submit-file.

Further commands:
   * qstat: Shows the current queue
   * qstat -a: As above, but with the number of requested cores
   * qstat -n: Shows in detail, which nodes are used
   * qdel job_number: Deletes jobs from the queue
   * showbf: Shows the number of free cores

---+++ Monitoring jobs

There are different tools for monitoring

   * =qstat -a=: Shows the queues with running and waiting jobs
   * =pbstop=: Similar to qstat but with a text-based graphical output
   * [[http://phicus-a.uni-muenster.de:81/ganglia/?m=load_one&r=hour&s=descending&c=PHICUS+Linux+Cluster&h=&sh=1&hc=4&z=small][Ganglia]]: Shows detailed information of every node including memory and CPU usage

---++ Storage

There is a 88 TB partition for /home and /scratch using the [[http://www.fhgfs.com/cms/][BeeGFS]] filesystem (formerly known as !FHGfs). Try to store your data like on PALMA: Put your programs in /home and your data in /scratch. Due to the amount of data, there is no backup at the moment.
---+ Using the Xeon Phi accelerators

If you want to use the accelerators, you always have to reserver three times more CPU cores than accelerators like:
<blockquote>
qsub -I -q default -l nodes=1:ppn=9:mics=3
</blockquote>

You have to recompile your code (with the Intel compiler) with the "-mmic" Flag, so you have to create a separate version for the host and the accelerator.
<blockquote>

mpiicpc code.c -mmic -o program.mic
</blockquote>

To use the accelerators that have been reserved for you, you can use the script "allocated-mics.pl" which is in your PATH. The host names of the cards are no longer mic0, mic1..., but have the name of their hosts in it so this would be sl270-01-mic0, sl270-01-mic1 and so on. This is necessary to set up the communication between the accelerators.

An example how to use the accelerators with MPI could look like this:
<blockquote>

allocated-mics.pl &gt; ${HOME}/mics.list<br />mpirun -n 120 -hostfile ${HOME}/mics.list ./program.mic
</blockquote>

Each card has 60 cores and can have up to four threads per core, so all in all 240 threads per card can be created. But from my experience it might be better to use only 120 threads per card.

---+++ Some useful links

   * [[https://software.intel.com/de-de/mic-developer]]
   * [[http://www.drdobbs.com/parallel/programming-intels-xeon-phi-a-jumpstart/240144160?pgno=4]]
   * [[https://software.intel.com/en-us/articles/using-the-intel-mpi-library-on-intel-xeon-phi-coprocessor-systems]]
   * [[http://www.prace-ri.eu/Best-Practice-Guide-Intel-Xeon-Phi-HTML?lang=en]]

---++ Known Issues

For temporary problems please read the login messages.

Other problems are:
   * <strike>No communication between Xeon Phi accelerators of different nodes is possible at the moment</strike> This has been fixed with the reinstallation with !CentOS7

---++ Support

In case of questions, please ask Holger Angenent or Martin Leweling via hpc@uni-muenster.de.

-- Main.HolgerAngenent - 2014-07-23
Topic revision: r11 - 2017-12-13 - HolgerAngenent
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding ZIVwiki? Send feedback
Datenschutzerklärung Impressum