Troubleshooting Performance
XSim is designed to optimally use the computational hardware you have available, whether a laptop or a leadership class supercomputing facility.
It achieves this through the use of advanced algorithms, but this does not guarantee any given simulation will run as fast as possible. This document outlines some simple checks which may aid speeding up a simulation.
On the one hand, there are different types of algorithms for field solves, particle movers and monte-carlo, and these may all scale up in slightly different ways, and require different amounts of inter- processor communication for large parallel simulations. Having many histories may also impact performance.
Firstly, it helps to understand which parts of the simulation are taking the most time. The best way to do this is to remove elements of the simulation one at a time, and to assess the difference in speed. Having measured performance of field solves, particle pushes (if applicable), monte carlo interactions (if applicable) and history objects, it may be possible to simplify some of these, for example by adjusting the number of physical particles per macroparticle.
In general one need not dump the fields and particles more often than is necessary as this will lead to slow visualisation, and the slowing down of the simulation while the data is written.
Electromagnetic Solves
Electromagnetic solves tend to be bound by memory access and the ability to pass boundary data across the network. As a rule of thumb - performance tends not to increase well when the domain on each processor is smaller than 40x40x40 - but this limit will depend on the relative performance of your network fabric and CPU. Also, cells outside perfect electrical conductor take longer than inside, so it can sometimes be worth adjusting the domain decomposition strategy to ensure the load is balanced equally. Minimize the regions over which any MAL or PML boundary conditions are applied, as these will be comparatively slow compared with a normal cell update.
Histories
Histories store their data in RAM in between data dumps and can write very large datasets. Some histories need to do non-trivial amounts of computation each time step.
Configuration Issues
The installer Tech-X provides can be expected to work well out of the box on desktop and high performance computing systems.
HPC systems often have high performance parallel systems. Commonly these are set up differently from your home area, and you will need to ensure that you are running with your data being output to a specific partition. Check the cluster documentation for more information.
HPC sytems are sometimes configured with a different MPI (to XSim‘s required MPICH MPI) in the environment set up by the queue system. In rare cases the MPI installation provided by XSim can pick up the wrong network card or fail to use the correct infiniband driver (normally where this has been customized heavily on those clusters). This will likely manifest as very poor parallel performance. For example, a simulation using sixteen cores and one node may run much faster than a simulation on thirty-two cores and two nodes (subject to the scaling advice above). In these cases we recommend you contact Tech-X support at support@txcorp.com for advice. Modification of the environment is non-trivial and may have unexpected consequences.