Optimizing Simulation Performance with VSim: Dr. Daniel Murray's Guide on Domain Decomposition

Join us for an insightful session at TWSS2023 with Dr. Daniel Murray, an astrophysicist and application engineer at TechX Corporation, as he delves into the intricacies of domain decomposition and its pivotal role in enhancing simulation performance. This presentation is a must-watch for anyone looking to optimize their computational resources and achieve faster, more efficient simulation results with VSim.

Key Takeaways:
Penning High-Intensity Ion Source: Explore the workings of a Penning High-Intensity Ion Source and understand why it serves as an excellent example for computational improvements.
Domain Decomposition Explained: Grasp the fundamental concepts of domain decomposition and learn how it impacts simulation performance.

Practical Workflow: Discover a practical workflow for improved load balancing, including a live demonstration on how to implement custom domain decomposition in VSim.
Results and Future Developments: See the tangible benefits of domain decomposition in action and get a sneak peek into the future improvements and optimizations planned for VSim.

Speaker: Dr. Dan Murray
Institution: Tech-X Corporation
Event: TWSS2023

Target Audience:
Computational physicists, Simulation and modeling experts, High-performance computing professionals, and Users of VSim and other Tech-X software. 

Evaluate for 30-days

We Understand That Complex Problems Require Comprehensive Solutions.

Start a risk-free 30-day evaluation and speak with one of our Ph.D. simulation experts.  

[00:00:00] I’m back to TWSS2023. My name is Colleen Dunn and I am the sales coordinator here at Tech X Corporation. We’re excited to have you join us. Some of those, some of you have been with us. This morning, if you’re just joining us now, this is our annual user conference that we have where we get our user community together and talk about all things VSim, XSim, which you’ll hear about soon, RSim, and all the TechX products.

We have now Dr. Dan Murray presenting for us, Dr. Dan Murray. He is an astrophysicist with over nine years of experience with experimental design, predictive modeling, magneto hydrodynamics, and high performance computing. Prior to joining TechX, he worked on star formation in the early and local universe, as well as data processing pipelines for minor planet and [00:01:00] supernova observations.

So welcome, Dan. Thank you very much for having me. My name’s Daniel Murray. I’m an application engineer here at TechX, and I’m going to be talking about domain decomposition and improving simulation performance. I’m going to be walking through the evolution of the vSim penning high intensity ion source.

A brief outline about this. I’m going to give a quick overview of a Penning high intensity ion source, explain what an ion source is. I’ll explain why it’s a useful example for computational improvements. I’ll then Just talk specifically about computational load balancing and what we mean when we say that.

I’ll discuss what are the results of our new domain decomposition. And then I’ll touch briefly upon future improvements that will be coming down the pipeline. [00:02:00] So what is a penning ion source? An ion source is a device that generates charged particle beams of whatever particular particle that you want.

Maybe that’s negative hydrogen, maybe it’s positively charged molecules. In the case of the VSIM Penning source example, it is going to be a singly ionized, positively charged argon ions. And so the general layout is that you have a cathode plate, or in this case two cathode plates that are set to a negative potential, in this case of minus 60 volts.

These cathode plates in this design emit electrons into the central plasma chamber, which is filled with a neutral argon gas. You can also attract these electrons into that plasma chamber by I either a neutral or positively [00:03:00] charged anode. In the case of the Vsim example, it’s neutral. Secondary electrons are actually also included in this example from electron interactions with the anode walls.

These electrons, both the primary and secondary electrons, interact with the neutral argon, and this forms a plasma due to impact ionization from the electron neutral collisions. This bath of argon ions are then extracted from the plasma source via two extraction plates forming an ion beam. These extraction plates are over here on the center of the image.

And so what one ends up with is something a little bit like this. So this is the same top down image of that geometry. We have our cathodes on either side of the plasma chamber, our anode here in the center. Center and our extraction plates. Uh, the red dots are the argon ions that are being [00:04:00] pulled out in this ion beam here, and the green dots are the electrons, both primary and secondary, uh, electrons.

Uh, and so this example demonstrates the electrostatic confinement of particles, and it’s also successful in showing the large density gradients that can result, uh, from this trapping.

I’m going to switch a little bit here to now a two dimensional beam test simulation, and this was provided to us courtesy. We have professors Andre Smoliakov and Linnaeus Koudel along with some of their students and colleagues at the University of Saskatchewan. So you can imagine this 2D simulation as looking down the beam of that Penning ion source.

So we’re looking at that collimated region. And Linnaeus already walked us through a little bit of the setup in one of his slides earlier. But the key bit is that you load [00:05:00] electrons and ions uniformly within a centrally located area.

This large density gradient can also result in an uneven distribution of computational load. So I have on the right here, what is essentially the ion density. This is the ions particles per cell. And so we can see that they are Tightly collimated into the center. This is a line out previous image. They are tightly centered and have a distinct gradient on those slight outer edges there.

And it’s important to be able to capture this because lab or experimental plasmas have these density gradients and ion sources. Especially tend to have large ion gradients.

So there’s a number of ways in which one [00:06:00] could start to tackle this problem, whether it be of improving the simulation wall clock time, improving your efficiency. It could be that you improve. The reactions framework. So that way your collisions take less time somehow, if that is possible. If you have some low hanging fruit, you could optimize this specific problem by using combiners and splitters of particles.

You could move from using a uniform grid to a variable grid. So that way you have more information on those high density regions. And I’ll touch briefly upon that or the most generally applicable one is to improve the computational load balancing. And so that is the one that we tried to strike first.

So what do we mean by [00:07:00] computational load balancing? For electrostatic particle in cell, or ESPIC, one can think of a simple performance model. And that is that the computational work per domain, or the amount of work that a CPU has to do for its domain, goes like some factor A times the number of cells in that domain, plus another factor B.

e times the number of particles in that domain, or perhaps a little bit more succinctly, the work per domain goes like a plus b times the number of particles per cell times the number of cells per domain. And this then would indicate that one has to take into account the number of particles per cell when they are devising their domain decomposition.

And to have the greatest efficiency for your computational resources, you would like this work per [00:08:00] domain to be consistent or somewhat close to identical across all domains. Another thing to note is this ratio of B over A is usually Around 3 to 5 for ES pick, which indicates that the simulation time is typically dominated by particles rather than being field solver dominated.

As of vSim 12. 2, particles per cell can now be written out. And so this is that previous image that I showed of the two dimensional on beam, only this time we are now looking at the number of particles per cell. And so we can see that several of these center cells have more than 1500 particles, or in fact just over 1700 particles per cell, while the vast majority of this simulation volume has less than two particles per cell.[00:09:00]

So again, I mentioned earlier, we could use particle. combiners for this specific problem that would significantly improve performance in that central region. But we’re going to try not to in this case. We want to see how far we can get with just doing domain decomposition improvements. So scaling these kinds of simulations with our default domain decomposition was failing because it inherently assumed that there were uniform particles per cell.

When Vorpal first starts a simulation at t equals zero, it takes the information that is available to it and decides upon a domain decomp. And that typically It looks something like, in this case, I’ve chosen eight subdomains or running this on eight cores. The subdomains look something like the image here on the left, right?

Everybody has the same number of cells, roughly even for [00:10:00] particles if they were in there. But what ends up happening is as you run the simulation forward in time, right, your particles collimate and you end up with that dense central region. And so At T equals roughly 25 nanoseconds. All of our particles are in that central region.

And so the four central CPUs are doing all of the heavy lifting while the four CPUs that are sitting on the sort of outside portion of the simulation are basically just twiddling their fingers, waiting for the heavy lifters to catch up.

Noticing that we came up with two methods for domain decomposition. The first one is sliced in which. Either the user, or we also have an analyzer that can write this out. The user can specify where on an axis they would like their domains to be [00:11:00] broken up. And we also can. Come up with a, what we refer to as recursive model and the recursive model looks to break up domains such that they have an even number of particles per domain.

Uh, and so using these two methods on our 2d on beam. image. Uh, these are what those subdomains would look like.

So the workflow for using this improved load balancing, uh, relies on the fact that vsim can now write out the numbers of particles per cell for each species. It then relies on the user. to use an analyzer that creates a new domain decomposition using this information. And part of this requires that the simulation has run long enough to go from a uniformly distributed state into a more confined state.

Once the analyzer is [00:12:00] run and a new domain decomp has been created, the simulation can either be restarted from t equals zero with that new decomp, Or if you’re one of our power users, and this simulation has been going for months, you can just start from your latest restart dump file.

So what does this workflow look like in person?

So I already have open vsim 12. 2. Uh, and inside of it, I have our 3d penning high intensity. density source model open and running. One of the key things I had mentioned is vSyn12 can now dump the particles per cell. In order to turn on that flag you have to have particles in the simulation. You then go to kinetic particles and there’s a flag right here that you can [00:13:00] Double click and say yes or no to, if you’re not looking to do a domain decomp at some point, you may decide that it’s not worth having the slightly additional IO for dumping an extra file for the particles per cell.

But this is where you. Turn it on and off. So with that, the user would run their simulation. You can see here, I have it going for the moment already in parallel, eight cores. I’ll go ahead and

dump and stop for now. Let’s

take a look. Dump number three is the last we’ll run that. So I’ll go over to my analyze tab. It is under compute [00:14:00] load balance decomp right here. I already have it open in order to run this analyzer. You need to provide the simulation name, which should already auto populate. You’ll have to provide the names of the species of particles that you wish to run this decomp on.

And that basically just uses the species names that the user has input here. So in this case, electrons and AR1. And this is a comma separated, no spaces list. You then tell the analyzer how many nodes you wish to run this simulation on and then how many cores on those nodes you wish to run. So in this case, one node, eight cores, eight times one is eight.

So it’s just eight cores total. If I was running this on two nodes, eight cores a piece, that’s 16 cores total. Just so the user is aware, I can list the number of [00:15:00] periodic directions and I can choose which dump number I wish to do this decomposition on. If I want to use the slice version for the decomposition, I can click intra slice if it’s just on the one node or inter slice if it’s across nodes.

And then all I have to do is hit analyze.

Sorry, might’ve misjudged the dump number.

And so the analyzer tells us the bounds for each of the sub domains on each of the nodes. And it also writes it out conveniently into a. A file that I can then just import directly. So it says here, it wrote it to pending source [00:16:00] decomp two dot PI. I will copy that. And then if I want to use this decomposition in my simulation, I come back to the setup.

I go to basic settings. Right here. I can choose instead of a default decomposition, I can choose to use a manual decomposition and then input the file name that my analyzer just produced.

I can then go back to the run tab, save my changes,

and I can then choose to either restart from T equals zero, or [00:17:00] I can choose to restart at dump number two. And run from there. And that’s the basic workflow for creating a custom domain decomposition, and then re importing that into your simulation. And then basically getting started back up and off to the races.

What are the results of doing this kind of a modification? We actually see a reasonable speed up in the wall clock time using either of these new domain decompositions. So for that two dimensional test problem with all of the electrons and argon ions collimated in the center, Using vSim’s original standard decomposition is in red here, and this is plotting the wall clock time versus the number of cores [00:18:00] in use.

Simulations using the slice decomposition are yellow and simulations using the green recursive are in green. Really the main takeaway is the slice. Decomposition is very good up until you move from one node to being intra node, although I still have to test the intra portion and then the recursive is great results even moving up to more than one node.

I did also say that I wanted to make one comment about variable grids versus. Is uniform grids, and that is that there are some costs associated with in addition to the benefits that variable grids provide. And the key bit here is that when moving from a uniform grid to a variable grid, the particle lookups tend to be slower.

So users should be aware. [00:19:00] Where that there is a cost having said that if you have the right kind of problem and power users have seen massive gains in going to a variable grid, even with this slightly additional cost. Now we will be looking into this in the. Future as well. So there are several improvements that we are looking to include in vSim 13 or vSim 14, depending upon technical issues.

And this is again, the optimization of combiners and splitters, along with, uh, better documentation for users explaining when they would or wouldn’t want to use combiners and splitters, those effects on particle phase space, et cetera. We also plan on having additional resources on the benefits and costs of moving to variable grids and to assist users on when they may want to make [00:20:00] that switch from a uniform to a variable grid.

And we’re also investigating some additional improvements to the reactions framework. So our work here on the custom domain decomposition really is just the start of a number of optimizations that we plan on making to vSim 13.

And that is it. Thank you. If we have any questions.

And if anyone has questions, you can type them into the Q& A box on the bottom of the screen.[00:21:00]

So we got asked, where do we get our value of B over A? Uh, that one was actually provided to me. By a couple of our senior developers, I don’t actually know the sort of underlying magic sauce of what fits into the factor of A and B. I’m not entirely sure if some of it is proprietary as well. I really just wanted to break down that using a simple model, you.

You could see that there was a dependence upon a number of particles as well. I have another question. Do we plan to include magnetic fields in future development? Yes, absolutely. A big part of this is start with the simplest problem, tackle what’s in front of you there, and then add additional physics to get yourself closer to both a realistic simulation and realistic physically based.

physics. [00:22:00] Why not use cylindrical domains to better capture particles? In the case of our Penning 3D example, the geometry itself would not lend itself to a cylindrical domain. That can certainly be applied to a number of other simulations or geometries. Really, our main thought was to Work on how can we better improve the usage of computational resources that are already available to an end user.

And this particular path also had the sort of widest ranging effects in that any user in any physical regime can use this domain decomposition workflow, regardless of whether or not they’re in 3d Cartesian [00:23:00] versus a 2d cylindrical or. 3D cylindrical.

 

滚动到顶部