- DomainDecomp
DomainDecomp
Determines the domain decomposition and periodicity. The DomainDecomp block determines how the simulation is broken up into domains for parallel processing and which directions are periodic.
If the domain boundaries lead to a specification of a number of processors that is not found at runtime, Vorpal will reconfigure the domain decomposition to match the runtime number of processors. That is, if you do not specify a decomposition, Vorpal will try to calculate an appropriate domain decomposition for you.
Vorpal is designed to run on a variety of systems, from a personal computer to a supercomputing cluster with hundreds of compute nodes. To accomodate the latter, Vorpal uses a hierarchical decomposition where the domain is first split equally along compute nodes (if running on a cluster), then split along the processors on each node. The DomainDecomp block determines the Inter-node decomposition, and the IntraNodeDecomp sub-block determines the decomposition within nodes. If running on a single compute node or personal computer, the IntraNodeDecomp can be omitted unless a manual decomposition is desired.
The regular decomposition algorithm used by Vorpal proceeds by first doing a prime factorization of the number of processors. It then uses that list of factors to divide up each dimension using the largest remaining factor on the dimension with the largest number of cells.
For example, for a simulation using 30 processors on a single compute
node and a domain size of 200 x 100 x 100
, Vorpal will perform
the following calculations:
30 = 5*3*2
The x direction has the largest number of cells (200) and 5 is the largest factor of the number of processors, so we divide x into 5 regions, each
40x100x100
With the x direction done, y is the now the direction with the largest number of cells (100), so Vorpal uses the next largest factor to divide it into 3 sections:
40x{33 or 34}x100
Finally, Vorpal partitions z with the remaining factor is 2, dividing it into 2 pieces.
40x{33 or 34}x50.
That is the size of our final domains.
In practice, for parallelism, Vorpal simulations should have approximately 20 - 40 cells in each dimension. Otherwise, the amount of messaging per computation increases and the benefits of using multiple processors is outweighed by the cost of the communication between them. The 20 - 40 rule depends on the complexity of the calculations being done. With a large number of particles per cell (20 or more) you could use approximately 20 cells in each direction.
The decomposition may affect what features can be used; for example, guard cells require a domain of at least four cells in every direction. If the number of processors and dimensions of the simulation chosen cause this requirement to be violated, you will see errors in the simulation.
DomainDecomp Parameters
- kind (string)
regular
and manual.
- periodicDirs (integer vector, optional, default = [])
Directions to have periodic boundary conditions.
- allowedDirs (integer vector, optional, default = [0 1 2])
Directions along which to split between compute nodes. If no IntraNodeDecomp block is present, also sets the allowed splitting for the domain on each node.
- IntraNodeDecomp (block, optional)
Specifies the decomposition between processors on the compute node or PC. If omitted, a regular decomposition with the above allowedDirs is performed. See IntraNodeDecomp
- ComputeNode (block, optional)
Specifies a manual decomposition for each ComputeNode (corresponds with hardware nodes). Required kind=manual for these to be used. ComputeNode block must specify lowerBounds and upperBounds (int vector) attributes. ComputeNode blocks have and optional attribute intraNodeDecomp (string, optional) that specifies the intraNodeDecomp block that should be used to decompose the node further. If this is not specified, a default intraNodeDecomp will be used.
Example of Default Decomposition Generation
In this example, Vorpal uses the aforementioned algorithm (prime
factorization) to automatically generate a decomposition for a parallel
simulation. The y- and z- directions will be periodic as indicated
by use of periodicDirs
.
<DomainDecomp decomp>
kind = regular
periodicDirs = [1 2]
</DomainDecomp>
Example of User-specified Decomposition
In this example, assuming there is a 200 x 200 mesh grid as described
in the section on Grid cell specification, the user requests domain
boundaries in the x-direction at 50, 100, and 150, and in the
y-direction for 100 (expressed as fractions of the indicated domain
length). This will produce 8 domains, requiring that 8 processors
be used for the simulation. If the simulation is perfomed in 3D and
16 processors are used, Vorpal will further subdivide the domain
in half along the z- direction unless specifically disabled by
setting allowedDirs
= [0 1] in the IntraNodeDecomp block.
Otherwise, if the actual number of processors used is not divisible by
the requested boundaries, Vorpal will default to the auto-generated
(prime factorization) decomposition as discussed in the general
description of DomainDecomp, above.
<DomainDecomp decomp>
kind = regular
periodicDirs = [1 2]
<IntraNodeDecomp xyDecomp>
kind = sliced
cpuXFracs = [0.25 0.5 0.75]
cpuYFracs = [0.5]
</IntraNodeDecomp>
</DomainDecomp>
Example of full manual Decomposition
In this example, we assume there is a 240 x 240 mesh grid. Due to computational complexity in the middle of the domain, the user wants more processors focused there with less around the edges (e.g. high density of particles at center). The decomposition can be manually specified with each core assigned a slab via upper and lower bounds. The figure below shows the desire decomposition, and the block below shows the required decomposition block to perform this decomposition on a single compute node.
<DomainDecomp decomp>
kind = manual
periodicDirs = [0 1]
<ComputeNode node0>
lowerBounds = [ 0 0 ]
upperBounds = [ 240 240 ]
intraNodeDecomp = node0decomp
</ComputeNode>
<IntraNodeDecomp node0decomp>
kind = manual
<NodeCpu cpu0>
lowerBounds = [ 0 0 ]
upperBounds = [ 60 120 ]
</NodeCpu>
<NodeCpu cpu1>
lowerBounds = [ 0 120 ]
upperBounds = [ 60 240 ]
</NodeCpu>
<NodeCpu cpu2>
lowerBounds = [ 60 0 ]
upperBounds = [ 180 60 ]
</NodeCpu>
<NodeCpu cpu3>
lowerBounds = [ 180 0 ]
upperBounds = [ 240 120 ]
</NodeCpu>
<NodeCpu cpu4>
lowerBounds = [ 180 120 ]
upperBounds = [ 240 240 ]
</NodeCpu>
<NodeCpu cpu5>
lowerBounds = [ 60 180 ]
upperBounds = [ 180 240 ]
</NodeCpu>
<NodeCpu cpu6>
lowerBounds = [ 60 140 ]
upperBounds = [ 100 180 ]
</NodeCpu>
<NodeCpu cpu7>
lowerBounds = [ 100 140 ]
upperBounds = [ 140 180 ]
</NodeCpu>
<NodeCpu cpu8>
lowerBounds = [ 140 140 ]
upperBounds = [ 180 180 ]
</NodeCpu>
<NodeCpu cpu9>
lowerBounds = [ 60 100 ]
upperBounds = [ 100 140 ]
</NodeCpu>
<NodeCpu cpu10>
lowerBounds = [ 100 100 ]
upperBounds = [ 140 140 ]
</NodeCpu>
<NodeCpu cpu11>
lowerBounds = [ 140 100 ]
upperBounds = [ 180 140 ]
</NodeCpu>
<NodeCpu cpu12>
lowerBounds = [ 60 60 ]
upperBounds = [ 100 100 ]
</NodeCpu>
<NodeCpu cpu13>
lowerBounds = [ 100 60 ]
upperBounds = [ 140 100 ]
</NodeCpu>
<NodeCpu cpu14>
lowerBounds = [ 140 60 ]
upperBounds = [ 180 100 ]
</NodeCpu>
</IntraNodeDecomp>
</DomainDecomp>