Complex Remote P3HT Workflow Example

This framework is based on the Dask python project and has been designed to integrate different types of simulation steps. These steps can be arranged in a directed acyclic graph (DAG) that specifies how certain workflow steps depend on others. These workflow steps can include (but are not limited to: setup of job inputs, submitting simulation jobs (eg run NWChem and/or LAMMPS), analyzing output as input for next workflow step and analyzing output for final simulation results. Visualization capability is also being included in the Chemistream framework. Furthermore, integration of the organic electronics example into Chemistream also means that this complex workflow can be implemented on diverse remote computing platforms. These platforms include not only cloud computing resources (eg AWS, Azure) but also on private clusters.

Setup Remote Chemistream

The tutorial Create Cloud HPC Cluster (5:39) demonstrates how Chemistream is used to setup/configure a multi-node HPC cluster in the cloud. The Docker image that is downloaded on each of the remote AWS compute instances has Chemistream, STREAMM, Jupyterlab, NWChem, LAMMPS, Slurm and networking tools pre-installed and configured. A shared NFS directory is created for efficient IO operations needed for HPC applications.

The tutorial Start Remote JupyterLab Server on Cloud HPC Cluster (4:48) shows how to create a remote Chemistream Jupyterlab server in the cloud and connect to it through a local Chemistream session. This provides a Jupyterlab web interface to HPC Cloud resources in an environment with Chemistream (and all its many dependencies) enabled.

Calculate Electronic Coupling Between P3HT molecules in a thin-film

The tutorial Complex Workflow Modeling Electronic Coupling Between Molecules in a P3HT Thin-Film using HPC Cloud Resources (12:46) shows how Chemistream is used to manage a complex workflow using Dask and Slurm to direct NWChem, LAMMPS and STREAMM calculations. The Jupyterlab notebooks shown in the tutorial demonstrate how to use Dask to construct an arbitrary directed, acyclic graph (DAG) that represents the P3HT workflow.

The tutorial will elaborate on the following P3HT workflow steps

  • create atomic coordinates and types for a hexane molecule. These coordinates are approximate and un-optimized in order to illustrate the flexibility of the framework, this Chemistream function is comprised of STREAMM library calls.

  • setup an NWChem input file for optimizing the hexane geometry, this Chemistream function is comprised of STREAMM library calls.

  • Chemistream function submits an NWChem simulation (through Slurm) for optimizing coordinates and monitors when the simulation is complete.

  • Chemistream function that analyzes NWChem output log and extracts optimized coordinates, this Chemistream function is comprised of STREAMM library calls.

  • Chemistream function submits an NWChem simulation for calculating ESP partial charges and monitors when the simulation is complete.

  • Chemistream function that analyzes NWChem output log and extracts partial charges and prepares data for creating a LAMMPS input file. This Chemistream function is comprised of STREAMM library calls.

  • setup a LAMMPS input file with optimized coordinates and partial charges for hexane and setup OPLS interaction parameters. This Chemistream function is comprised of STREAMM library calls

  • Chemistream function that submits a LAMMPS simulation for equilibrating a single hexane molecule with the parameters determined by the previous workflow steps

  • repeat the above steps for thiophene

  • Chemistream function to join hexane and thiophene into a 3HT molecule using STREAMM library functions

  • run short LAMMPS simulation to minimize energy of 3HT molecule

  • Chemistream function to join 5 3HT molecules into P3HT using STREAMM library functions

  • run short LAMMPS simulations to mimimize energy and run an NVT relaxation step

  • Chemistream function to replicate 20 P3HT molecules in a large box (checking for overlaps) using STREAMM library functions

  • run LAMMPS simulation to equilbrate P3HT thin-film

  • Chemistream function to identify P3HT pairs and setup NWChem simulation directories using STREAMM library functions

  • run multiple NWChem simulations to calculate electronic coupling of identified pairs

  • Chemistream functions to collect results and visualize P3HT pairs.

These workflows can constructed by suitable definitions of Chemistream functions that have been designed to wrap Dask task management functionality in a flexible way. These functions are documented at Chemistream API documentation. Tech-X will also work with customers to construct specific Jupyter notebooks like the one shown in this tutorial that implements the NREL P3HT simulation workflow for their own industry simulation workflows.