Tags:
view all tags
---+ PALMA-NG %TOC{title="Content"}% ---++ Overview palma3 is the login node to a newer part of the !PALMA system. It has various queues/partitions for different purposes: * u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for !PALMA. It replaces the old !ZIVHPC cluster * k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each * normal: 29 nodes with 32 Broadwell CPU cores each and 128 GB RAM. * zivsmp: A SMP machine with 512 GB RAM. The old login node of !ZIVHPC. (not available yet) * phi: Two nodes with 4 Intel Xeon Phi Knights Corner accelerators each. (not available yet). * knl: Four nodes with a Xeon Phi Knights Landing accelerator * requeue: Job in this queue will run on the nodes of the above mentioned nodes. If your jobs are running on one of the exclusive nodes while jobs are put in there, your job will be terminated and requeued, so use with care. There are some special queues, which are only allowed for certain groups (these are also Broadwell nodes like in the normal queue): * p0fuchs: 8 nodes for exclusive usage * p0kulesz: 4 nodes for exclusive usage * p0klasen: 1 nodes for exclusive usage * p0kapp: 1 nodes for exclusive usage * hims: 4 nodes for exclusive usage ---+++ The module concept Environment variables (like PATH, LD_LIBRARY_PATH) for compilers and libraries can be set by modules: | *Command (Short- and Long-form)* | *Meaning* | | module av[ailable] | Lists all available modules | | module li[st] | Lists all modules in the actual enviroment | | module show modulname | Lists all changes caused by a module | | module add modul1 modul2 ... | Adds module to the current environment | | module rm modul1 modul2 ... | Deletes module from the current environment | | module purge | Deletes all modules from czrrent environment | Several environment variables will be set by the modules. When you log in to palma3, some modules are loaded automatically. ---++++ Using the module command in submit scripts If you want to use the _module_ command in submit scripts, the line <verbatim> source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh</verbatim> has to be added before. Otherwise, just put the "module add" commands in your .bashrc (which can be found in your home-directory). ---++ Monitoring [[http://palma3.uni-muenster.de/ganglia/?c=palma-ng&m=load_one&r=hour&s=by%20name&hc=4&mc=2][Ganglia]] If you have X forwarding enabled, you can use llview (Just type "llview" at the command line). <img alt="llview.png" height="401" src="%ATTACHURLPATH%/llview.png" width="452" /> ---++ The batch system The batch system on PALMA3 is SLURM, but there is a wrapper for PBS installed, so most of your skripts should still be able to work. If you want to switch to SLURM, this document might help you: [[https://slurm.schedmd.com/rosetta.pdf]] When using PBS skript, there are some differences to PALMA: * The first line of the submit script has to be #!/bin/bash * A queue is called partition in terms of SLURM. These terms will be used synonymous here. * The variable $PBS_O_WORKDIR will not be set. Instead you will start in the directory in which the script resides. * For using the "module add" command, you will have to source some scripts first: "source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh" ---+++ Submit a job Create a file for example called submit.cmd <verbatim> #!/bin/bash # set the number of nodes #SBATCH --nodes=1 # set the number of CPU cores per node #SBATCH -n 8 # set a partition #SBATCH -p u0dawin # set max wallclock time #SBATCH --time=24:00:00 # set name of job #SBATCH --job-name=test123 # mail alert at start, end and abortion of execution #SBATCH --mail-type=ALL # set an output file #SBATCH -o output.dat # send mail to this address #SBATCH --mail-user=your_account@uni-muenster.de # In the u0dawin queue, you will need the following line source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh # run the application ./program</verbatim> You can send your submission to the batch system with the command "sbatch submit.cmd" A detailed description can be found here: [[http://slurm.schedmd.com/sbatch.html]] ---+++ Show information about the queues <verbatim>scontrol show partition</verbatim> ---+++ Show information about the nodes <verbatim>sinfo</verbatim> ---+++ Running interactive jobs with SLURM Use for example the following command: <verbatim>srun -p u0dawin -N 1 --ntasks-per-node=8 --pty bash</verbatim> This starts a job in the u0dawin queue/partition on one node with eight cores. ---+++ Information on jobs List all current jobs for a user: <verbatim>squeue -u <username></verbatim> List all running jobs for a user: <verbatim>squeue -u <username> -t RUNNING</verbatim> List all pending jobs for a user: <verbatim>squeue -u <username> -t PENDING</verbatim> List all current jobs in the normal partition for a user: <verbatim>squeue -u <username> -p normal</verbatim> List detailed information for a job (useful for troubleshooting): <verbatim>scontrol show jobid -dd <jobid></verbatim> Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.<br /><br />To get statistics on completed jobs by jobID: <verbatim>sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed</verbatim> To view the same information for all jobs of a user: <verbatim>sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed</verbatim> Show priorities for waiting jobs: <verbatim> sprio</verbatim> ---+++ Controlling jobs To cancel one job: <verbatim>scancel <jobid></verbatim> To cancel all the jobs for a user: <verbatim>scancel -u <username></verbatim> To cancel all the pending jobs for a user: <verbatim>scancel -t PENDING -u <username></verbatim> To cancel one or more jobs by name: <verbatim>scancel --name myJobName</verbatim> To pause a particular job: <verbatim>scontrol hold <jobid></verbatim> To resume a particular job: <verbatim>scontrol resume <jobid></verbatim> To requeue (cancel and rerun) a particular job: <verbatim>scontrol requeue <jobid></verbatim> -- %USERSIG{HolgerAngenent - 2016-08-22}%
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
png
llview.png
r2
r1
manage
49.3 K
2016-12-06 - 12:31
HolgerAngenent
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r28
|
r13
<
r12
<
r11
<
r10
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r11 - 2017-02-13
-
HolgerAngenent
Home
Site map
Anleitungen web
Exchange web
Main web
TWiki web
Anleitungen Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Български
Cesky
Dansk
Deutsch
English
Español
Suomi
_Français_
Italiano
日本語
한글
Nederlands
Polski
Português
Русский
Svenska
Українська
简体中文
簡體中文
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding ZIVwiki?
Send feedback
Datenschutzerklärung
Impressum