---+ PALMA-NG %TOC{title="Content"}% ---++ Overview palma3 is the login node to a newer part of the !PALMA system. It has various queues/partitions for different purposes: * u0dawin: A queue for general purpose. It is usable for everyone, even without being a member of the groups that have submittet a proposal for !PALMA. It replaces the old !ZIVHPC cluster * k20gpu: Four nodes equipped with 3 K20 nVidia Tesla accelerators each * normal: 44 nodes with 32 Broadwell CPU cores each. (Not fully installed yet) * zivsmp: A SMP machine with 512 GB RAM. The old login node of !ZIVHPC. (not available yet) * phi: 2 Nodes with 4 Intel Xeon Phi accelerators each. (not available yet). ---+++ The module concept Environment variables (like PATH, LD_LIBRARY_PATH) for compilers and libraries can be set by modules: | *Command (Short- and Long-form)* | *Meaning* | | module av[ailable] | Lists all available modules | | module li[st] | Lists all modules in the actual enviroment | | module show modulname | Lists all changes caused by a module | | module add modul1 modul2 ... | Adds module to the current environment | | module rm modul1 modul2 ... | Deletes module from the current environment | | module purge | Deletes all modules from czrrent environment | Several environment variables will be set by the modules. When you log in to palma3, some modules are loaded automatically. ---++++ Using the module command in submit scripts If you want to use the _module_ command in submit scripts, the line <verbatim> source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh</verbatim> has to be added before. Otherwise, just put the "module add" commands in your .bashrc (which can be found in your home-directory). ---++ Monitoring [[http://palma3.uni-muenster.de/ganglia/?c=palma-ng&m=load_one&r=hour&s=by%20name&hc=4&mc=2][Ganglia]] ---++ The batch system The batch system on PALMA3 is SLURM, but there is a wrapper for PBS installed, so most of your skripts should still be able to work. If you want to switch to SLURM, this document might help you: [[https://slurm.schedmd.com/rosetta.pdf]] When using PBS skript, there are some differences to PALMA: * The first line of the submit script has to be #!/bin/bash * A queue is called partition in terms of SLURM. These terms will be used synonymous here. * The variable $PBS_O_WORKDIR will not be set. Instead you will start in the directory in which the script resides. * For using the "module add" command, you will have to source some scripts first: "source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh" ---+++ Submit a job Create a file for example called submit.cmd <verbatim> #!/bin/bash # set the number of nodes #SBATCH --nodes=1 # set the number of CPU cores per node #SBATCH -n 8 # set a partition #SBATCH -p u0dawin # set max wallclock time #SBATCH --time=24:00:00 # set name of job #SBATCH --job-name=test123 # mail alert at start, end and abortion of execution #SBATCH --mail-type=ALL # set an output file #SBATCH -o output.dat # send mail to this address #SBATCH --mail-user=your_account@uni-muenster.de # In the u0dawin queue, you will need the following line source /etc/profile.d/modules.sh; source /etc/profile.d/modules_local.sh # run the application ./program</verbatim> You can send your submission to the batch system with the command "sbatch submit.cmd" A detailed description can be found here: [[http://slurm.schedmd.com/sbatch.html]] ---+++ Show information about the queues <verbatim>scontrol show partition</verbatim> ---+++ Show information about the nodes <verbatim>sinfo</verbatim> ---+++ Running interactive jobs with SLURM Use for example the following command: <verbatim>srun -p u0dawin -N 1 --ntasks-per-node=8 --pty bash</verbatim> This starts a job in the u0dawin queue/partition on one node with eight cores. ---+++ Information on jobs List all current jobs for a user: <verbatim>squeue -u <username></verbatim> List all running jobs for a user: <verbatim>squeue -u <username> -t RUNNING</verbatim> List all pending jobs for a user: <verbatim>squeue -u <username> -t PENDING</verbatim> List all current jobs in the normal partition for a user: <verbatim>squeue -u <username> -p normal</verbatim> List detailed information for a job (useful for troubleshooting): <verbatim>scontrol show jobid -dd <jobid></verbatim> Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.<br /><br />To get statistics on completed jobs by jobID: <verbatim>sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed</verbatim> To view the same information for all jobs of a user: <verbatim>sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed</verbatim> Show priorities for waiting jobs: <verbatim> sprio </verbatim> ---+++ Controlling jobs To cancel one job: <verbatim>scancel <jobid></verbatim> To cancel all the jobs for a user: <verbatim>scancel -u <username></verbatim> To cancel all the pending jobs for a user: <verbatim>scancel -t PENDING -u <username></verbatim> To cancel one or more jobs by name: <verbatim>scancel --name myJobName</verbatim> To pause a particular job: <verbatim>scontrol hold <jobid></verbatim> To resume a particular job: <verbatim>scontrol resume <jobid></verbatim> To requeue (cancel and rerun) a particular job: <verbatim>scontrol requeue <jobid></verbatim> -- %USERSIG{HolgerAngenent - 2016-08-22}%
This topic: Anleitungen
>
WebHome
>
HPC
>
PALMA3
Topic revision: r9 - 2016-11-28 - HolgerAngenent
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding ZIVwiki?
Send feedback
Datenschutzerklärung
Impressum