Running Vorpal with MPI either directly or with Parallel Queuing Systems requires use of different shell scripts to enable invocation of the Vorpal executable, as outlined below. In this section we discuss Linux queuing systems. For running Vorpal through Windows HPC Cluster Pack see Running Vorpal on a Windows HPC Cluster.
Queuing systems, such as PBS, LoadLeveler, LSF, SGE and Slurm, require the submission of a shell script with embedded comments that act as commands that the queuing system interprets. Below we show some of the more common embedded comments. Discussion of all the embedded comments is beyond the scope of this document. Furthermore, the command for submitting the job can vary. Below we will provide a common command, but you should contact the system administrator to ensure that you have the correction job submission command.
Many supercomputers use the Lustre file system, which has multiple Object Storage Targets (OSTs). Using more of these improves output speed for the large files produced by Vorpal in large-scale computing. The number to be used is set by a stripe command, such as
which sets the number of OSTs to C in the current directory. As noted at https://www.nersc.gov/users/storage-and-file-systems/i-o-resources-for-scientific-applications/optimizing-io-performance-for-lustre/, one should set C according to
File size | C |
---|---|
1-10GB | 8 |
10-100GB | 24 |
100GB+ | 72 |
Here is an example of a basic shell script for a PBS-based system.
#PBS -N vaclaunch
#PBS -l nodes=2:ppn=2
export VSIM_DIR=$HOME/VSim-10.0
source $VSIM_DIR/VSimComposer.sh
cd /directory/containing/your/input/file
mpiexec -np 4 vorpal -i vaclaunch.pre -n 250 -d 50
The -l commands relate to the resource requirements of the job. This file explicitly specified the number of cores per node, and number of nodes, so we have a total of four MPI ranks on which to execute the job, which is mirrored in the mpiexec -np argument.
If the contents of the above file are in vaclaunch.pbs, then the job would commonly be submitted by
qsub vaclaunch.pbs
although some MOAB based systems might use msub.
Here is another example, this time for a SGE (Sun Grid Engine) job.
#$ -cwd -V
#$ -l h_rt=0:10:00
#$ -l np=16
#$ -N magnetron2D
export VSIM_DIR=$HOME/VSim-10.0
source $VSIM_DIR/VSimComposer.sh
mpiexec -np 16 vorpal -i magnetron2D.pre
This time the -cwd -V tells the queue system to use the current working directory for the job, and to import the current environment and make this avialable for the script.
In this case, the queue system calculates the configuration based on the choice of 16 cores. On this cluster the -l commands is also used to specify the run duration, which, in this example, is set to 10 minutes.
If the contents of the above file are in magnetron2D.qsub, then the job would commonly be submitted by
qsub magnetron2D.pbs
Here is a third example, for the Platform LSF system:
#BSUB -o EBDP-VSim10.0.out
#BSUB -e EBDP-VSim10.0.err
#BSUB -R "span[ptile=16]"
#BSUB -n 32
#BSUB -J testVSim10.0
#BSUB -W 45
cd $HOME/electronBeamDrivenPlasma
export VSIM_DIR=$HOME/VSim-10.0
source $VSIM_DIR/VSimComposer.sh
export MYJOB=electronBeamDrivenPlasma.pre
mpiexec -np 32 vorpal -i ${MYJOB}
With this submission system and job scheduler, -W is used to denote the wall time for the job in minutes. The name is passed by -J.
Sometimes one must reference a specific project for accounting purposes in the job submission file. For this you may use the -A option.
If the above commands are in the file, electronBeamDrivenPlasma.lsf, then the job is commonly submitted by
bsub < electronBeamDrivenPlasma.lsf
Here is an example from a Cray system with a custom Vorpal build running from a directory /project/nnnnn/gnu-5.2.40.
#!/bin/bash
#SBATCH --account=nnnnn
#SBATCH --job-name=lpa
#SBATCH --output=lpa.out
#SBATCH --error=lpa.err
#SBATCH --nodes=2
#SBATCH --time=00:05:00
srun --ntasks=32 --hint=nomultithread --ntasks-per-node=16 /project/nnnnn/gnu-5.2.40/vorpal-exported/bin/vorpal -i laserPlasmaAccel.pre -n 100 -d 20
If the file containing the above is named, laserPlasmaAccel.slm, then this job is submitted with
sbatch laserPlasmaAccel.slm
You can check on your job with
squeue -u $USER
and you can stop the job with
scancel JOBID
where JOBID is the job id returned by squeue.