Chemistream API documentation

Chemistream Driver

ChemistreamDriver

class chemistream.driver.ChemistreamDriver(user=None)

Bases: makalii.makalii_driver.MakaliiDriver

Derived class from base class MakaliiDriver to augment and add functionality specific to Chemistream

Constructor args:

user (str): user name (to select file settings)

password (str): Password to be set in remote containers

dumpConnectScript()

Utility to be used in Chemistream Jupyter notebooks to dump a convenience script to connect to the remote cluster (Use in a terminal)

Wrapper for opening remote html ssh tunnel. Checks for running remote JupyterLab server, then runs open_ssh_html_tunnel

Args:

local_port (int): port on local machine for ssh tunnel

remote_port (int): port on remote machine running server

start_jsme()

Start local JSME session in a separate tab

start_jupyterlab_server(remote_port)

Startup remote jupyter server on remote port and run setup steps related to server start. If server is already running on remote port, this method returns without effecting remote server NOTE: jupyter etc must be in PATH on remote machine

Args:

remote_port (int): Port for remote jupyter server to be running

Chemistream Analysis

ChemiVis

class chemistream.analysis.ChemiVis

Bases: object

Base class defining the interface to an object that wraps visualization components

aseToView(grid)

Wrapper for aseLatticeToXYZ to generate string and then create image from ASE lattice object using NGLView

Args:

grid - ASE lattice

Returns:

displayed NGL view object

nwchemInputToImage(input_file, targetIndex=0)

Parse NWChem input file and render NGLView image NOTE: looks for atoms between ‘geometry’ and ‘end’

Args:

input_file: full path to input file

targetIndex: (default 0) index of molecule in file to image [0, 1,…]

Return:

None

setSparksDumpInfo(spFile)

Set a list of header dictionaries with all frame info for a SPPARKS dump file. This method searches for the line numbers where the search string ‘ITEM: TIMESTEP’ is located as the beginning of a frame of data at a particular time step

Args:

spFile: path to sparks dump file

sparksDumpToImage(iframe=0, xslice=None, yslice=None, zslice=None)

Convert frame of SPPARKS dump file to an image

Args:

iframe (int) #-of-frame in SPPARKS dump file to image

xslice: max x-edge of sim box to viz

yslice: max y-edge of sim box to viz

zslice: max z-edge of sim box to viz

Return:

nglview object

sparksDumpToImages()

Driver for sparksDumpToImage. Displays menus for choosing #-of-atoms to display and frame index of loaded file to display

visBBFromJSON(bb_file_list, ftype='xyz', bond_guess=1.5, numCols=2, sizePx=400)

Import STREAMM building block JSON file and visualize using ChemiVis object.

Args:

bb_file_list (str): Name of the JSON file (full path from project dir)

ftype (str): Format type string ‘xyz’, ‘sdf’

bond_guess (float): Length to consider a bond (for sdf type)

numCols (int): #-of-columns in grid

sizePx (int): Size of grid (in px units)

xyzToGridView(xyzStrList, numCols=2, sizePx=200)

Driver for xyzToNglView method to display a grid of ngl view objects. #-of-columns can be set and rows are automatic

Args:

xyzStrList (list): list of XYZ format strings

numCols (int): number of columns for grid (default 2)

sizePx (int): Size of each frame in grid [px]

Returns:

ipywidgets grid object

xyzToNglView(xyzStr, sizeStr='300px')

Convert XYZ string directly to an nglview object

Args:

xyzStr (str) XYZ string

sizeStr (str) size string in ‘px’

Return:

nglview ‘view’ object

RDKitBase

class chemistream.rdkitbase.RDKitBase(turnOffLogging)

Bases: object

Specific wrapper class for RDKit encapsulating basic RDKit functionality utilized across Chemistream applications

checkMolImplHsMaxPerAtom(mol)

Check molecule and return the largest #-impl hydrogens for any atom in the molecule

create_check_dir(dirstring)

Utility to create a directory recursively if it does not exist

Args:

dirstring (str): full directory path to create

generateAtomOrder(mol)

Generate a SMILES string from a mol-object and use to re-order atoms within mol-object

molStrToObj(mol_str)

Convert mol file to mol object Note: defaults to NOT removing hydrogens so reads the mol-string raw

Args:

mol_str(str): mol string

Return: mol object

molToImageAndFile(mol, full_file_name='mol.png', highlightAtoms=[], label_string='')

Output mol-object as a PNG file Exposes option to highlight a list of atoms in mol-object and label image

Args:

mol: RDKit mol-object full_file_name(str): full path to image file highlightAtoms(list): list of atom indices in mol-object to highlight label_string(str): embedded label for image

Returns: a PIL image object

reorderMolObj(mol)

Generate a SMILES string from a mol-object and use to re-order atoms within mol-object and return

Args:

mol – RDKit mol-object

Return

mol-object reordered with SMILES ordering

smilesToImage(smiles, imageDir='.', skipDisplay=False)

Construct image and display and dump to PNG file with name of SMILES string used to generate

Args:

smiles (str): SMILES string imageDir (str): Directory to dump corresponding PNG image skipDisplay (bool): Flag for showing image to screen

Chemistream Workflow

CalculationBase

class chemistream.workflow.CalculationBase(project_dir='./')

Bases: object

Base class for a calculation. Holds project directory info and basic setup,project-management methods

clearSimJobs(fileTypes=['.log'])

Remove job output files in order to force re-submission

Args:

fileTypes (list): of file suffixes to search for

ScriptCalc

class chemistream.workflow.ScriptCalc(project_dir='./', comp_string='Total times', submitType='BASH', submitArgs='')

Bases: chemistream.workflow.CalculationBase

Defines calculations as simulations launched by scripts that dump results to a file, all within a dedicated script directory

NOTE: the main run functions can NOT depend on class data.

Once the class data is set for delayed functions it cannot be changed

Constructor args:

project_dir (str): Top directory where all job submission directories are located

comp_string (str): String to check in log file for completion

submitType (str): BASH, PBS or SLURM where

BASH – “bash [args] JOB_SCRIPT_FULL </dev/null &>/dev/null &” PBS – “qsub [args] JOB_SCRIPT_FULL ” SLURM – “sbatch [args] –chdir=JOB_DIR_FULL JOB_SCRIPT” and JOB_SCRIPT_FULL and JOB_SCRIPT get filled in by class methods.

submitArgs (str): [args] to be sent to submission command. eg sbatch [args]

where [args] –> –ntasks-per-node=2

Usage: eg

test0=ScriptJob(proj_dir)

runScript(*depJobArgs, job_dir='./', **kJobArgs)

The job_script must be able to be called from a different directory than where it will be running, such that it can access all job submission related files within the project dir

This method must perform two basic functions:
  1. make ‘non-blocking call’ submitting the simulation

  2. make a ‘blocking call’ that waits until simulation is done

  3. return this job_dir value for other functions

Args:

depJobArgs: Dask delayed functions that this function depends upon

job_dir: Name of job directory where calculation will take place

kJobArgs: keyword args for running this function eg

job_script: Name of job submission script (.sh or .pbs), job_output: Name of output file to check for completion, completion_tag: search string for denoting completion

Returns:

(str) this job directory

DaskManager

class chemistream.workflow.DaskManager(threadsPerWorker=5, nWorkers=1, processFlag=True)

Bases: object

Class to manager client threads and monitoring task submission

computeGraph(*args)

For a function that returns a final result, compute this answer and return

Args:

args: (eg args[0]: input Dask delayed function (default to class data))

Returns:

value of compututaion of task graph

runTasks(*args)

Run the tasks in the graph associated with the terminal node stored in this class object

Args:

args: variable length args (eg args[0]: input Dask delayed function (default to class data))

Returns:

iframe: progress bars

setTerminalNode(dnode)

Set the dask delayed function at the ‘terminal’ node of a task graph

Args:

dnode: Dask delayed function

showDashboard()

Show the client object dashboard info

showTasks(*args, rankDir='LR')

Show the task graph associated with the terminal node set in class object data

Args:

args: variable length args (eg args[0]: input Dask delayed function (default to class data))

DaskDelayedCalc

class chemistream.workflow.DaskDelayedCalc

Bases: object

Creates Dask delayed simulation objects based on run methods in derived classes from CalculationBase. Collection of dask task graph calculations

getDelayedCalc(nodeName, calcFunc, *depJobArgs, **kJobArgs)

Return a dask.delayed version of run calculation function eg an object derived from CalculationBase

Args:

nodeName: Name of job graph nodes

calcFunc: Name of calculation function (eg classObj.runScript)

depJobArgs: Dependent delayed functions (read by dask.delayed)

kJobArgs: Keyword arguments for running a single calc

Returns:

dask delayed function object

getDelayedFuncName(delayedFunc)

Return the node name of a Dask delayed function

Args:

delayedFunc a Dask delayed function created by this module

Returns:

string of node name

getDepNodeDirs(delayedFunc)

Get names of directories on which this function depends

Args:

delayedFunc a Dask delayed function created by this module

Returns:

list of strings of directory names

getDepNodeNames(delayedFunc)

Get node names of all dependent functions. NOTE: uses the stored terminal node for the entire graph

Args:

delayedFunc a Dask delayed function created by this module

Returns:

list of strings of node names

getInputJobDir(delayedFunc)

Get input job directory for the input delayed dask function

Args:

delayedFunc a Dask delayed function created by this module

Returns:

string name of directory

getNodeName(delayedFunc)

Get node name for the input delayed dask function

Args:

delayedFunc a Dask delayed function created by this module

Returns:

string name of job node