Chemistream API documentation

Chemistream Driver

ChemistreamDriver

class chemistream.driver.ChemistreamDriver(user=None)

Bases: makalii.makalii_driver.MakaliiDriver

Derived class from base class MakaliiDriver to augment and add functionality specific to Chemistream

Constructor args:

user (str): user name (to select file settings)

password (str): Password to be set in remote containers

dumpConnectScript()

Utility to be used in Chemistream Jupyter notebooks to dump a convenience script to connect to the remote cluster (Use in a terminal)

Wrapper for opening remote html ssh tunnel. Checks for running remote JupyterLab server, then runs open_ssh_html_tunnel

Args:

local_port (int): port on local machine for ssh tunnel

remote_port (int): port on remote machine running server

Wrapper for opening remote html ssh tunnel. Checks for running remote JSME server, then runs open_ssh_html_tunnel

Args:

local_port (int): port on local machine for ssh tunnel

remote_port (int): port on remote machine running server

start_jsme_http_server(remote_port)

Startup remote http server for JSME molecular editor If server is already running on remote port, this method returns wo effecting remote server NOTE: python module http.server and JSME must on remote machine container

Args:

remote_port (int): Port for remote JSME http server to be running

start_jupyterlab_server(remote_port)

Startup remote jupyter server on remote port and run setup steps related to server start. If server is already running on remote port, this method returns without effecting remote server NOTE: jupyter etc must be in PATH on remote machine

Args:

remote_port (int): Port for remote jupyter server to be running

Chemistream Analysis

ChemiVis

class chemistream.analysis.ChemiVis

Bases: object

Base class defining the interface to an object that wraps visualization components

aseLatticeToXYZ(grid)

Take an ASE lattice object and convert positions and symbols to a XYZ string (NOTE Ar is the ‘empty’)

Args:

grid - ASE lattice

Returns:

XYZ format string

aseToView(grid)

Wrapper for aseLatticeToXYZ to generate string and then create py3Dmol image from ASE lattice object

Args:

grid - ASE lattice

bbToImage(*bb)

Convert tuple of STREAMM building block to an array of images Driver for _bbToXYZString, _bbToSDFString and strucStringsToView

Args:

bb: tuple of Buildingblock objects

formatType: xyz or sdf string

nwchemInputToImage(input_file, targetIndex=0)

Parse NWChem input file and render py3Dmol image NOTE: looks for atoms between ‘geometry’ and ‘end’

Args:

input_file: full path to input file

targetIndex: (default 0) index of molecule in file to image [0, 1,…]

Return: None

setSparksDumpInfo(spFile)

Set a list of header dictionaries with all frame info for a SPPARKS dump file. This method searches for the line numbers where the search string ‘ITEM: TIMESTEP’ is located as the beginning of a frame of data at a particular time step

Args:

spFile: path to sparks dump file

sparksDumpToImage(iframe=0, maxAtoms=None)

Convert tuple of dump file names for SPPARKS to an array of images

Args:

maxAtoms: Number of atoms in file to show

Return:

py3Dmol image

sparksDumpToImages()

Driver for sparksDumpToImage. Displays menus for choosing #-of-atoms to display and frame index of loaded file to display

sparksDumpToPNG()

Takes internally rendered py3Dmol view and generates a PNG image

strucStringsToView(*strucStrings, showBonds=True)

Convert a tuple of structure strings (determined by formatType) to images using py3Dmol to generate an image for embedding in a Jupyter notebook For multiple objects, a grid of images will be created

Args:

strucStrings: Buildingblock/Structure object(s) showBonds (bool): Show bonds

Returns:

py3mol view

visBBFromJSON(bb_file_list, ftype='xyz', bond_guess=1.5)

Import STREAMM building block JSON file and visualize using ChemiVis object. Elements

Args:

bb_file_list (str): Name of the JSON file (full path from project dir)

ftype (str): Format type string ‘xyz’, ‘sdf’

bond_guess (float): Length to consider a bond (for sdf type)

RDKitBase

class chemistream.rdkitbase.RDKitBase(turnOffLogging)

Bases: object

Specific wrapper class for RDKit encapsulating basic RDKit functionality utilized across Chemistream applications

checkMolImplHsMaxPerAtom(mol)

Check molecule and return the largest #-impl hydrogens for any atom in the molecule

create_check_dir(dirstring)

Utility to create a directory recursively if it does not exist

Args:

dirstring (str): full directory path to create

generateAtomOrder(mol)

Generate a SMILES string from a mol-object and use to re-order atoms within mol-object

molStrToObj(mol_str)

Convert mol file to mol object Note: defaults to NOT removing hydrogens so reads the mol-string raw

Args:

mol_str(str): mol string

Return: mol object

molToImageAndFile(mol, full_file_name='mol.png', highlightAtoms=[], label_string='')

Output mol-object as a PNG file Exposes option to highlight a list of atoms in mol-object and label image

Args:

mol: RDKit mol-object full_file_name(str): full path to image file highlightAtoms(list): list of atom indices in mol-object to highlight label_string(str): embedded label for image

Returns: a PIL image object

reorderMolObj(mol)

Generate a SMILES string from a mol-object and use to re-order atoms within mol-object and return

Args:

mol – RDKit mol-object

Return

mol-object reordered with SMILES ordering

smilesToImage(smiles, imageDir='.', skipDisplay=False)

Construct image and display and dump to PNG file with name of SMILES string used to generate

Args:

smiles (str): SMILES string imageDir (str): Directory to dump corresponding PNG image skipDisplay (bool): Flag for showing image to screen

Chemistream Workflow

CalculationBase

class chemistream.workflow.CalculationBase(project_dir='./')

Bases: object

Base class for a calculation. Holds project directory info and basic setup,project-management methods

clearSimJobs(fileTypes=['.log'])

Remove job output files in order to force re-submission

Args:

fileTypes (list): of file suffixes to search for

ScriptCalc

class chemistream.workflow.ScriptCalc(project_dir='./', comp_string='Total times', submitType='BASH', submitArgs='')

Bases: chemistream.workflow.CalculationBase

Defines calculations as simulations launched by scripts that dump results to a file, all within a dedicated script directory

NOTE: the main run functions can NOT depend on class data.

Once the class data is set for delayed functions it cannot be changed

Constructor args:

project_dir (str): Top directory where all job submission directories are located

comp_string (str): String to check in log file for completion

submitType (str): BASH, PBS or SLURM where

BASH – “bash [args] JOB_SCRIPT_FULL </dev/null &>/dev/null &” PBS – “qsub [args] JOB_SCRIPT_FULL ” SLURM – “sbatch [args] –chdir=JOB_DIR_FULL JOB_SCRIPT” and JOB_SCRIPT_FULL and JOB_SCRIPT get filled in by class methods.

submitArgs (str): [args] to be sent to submission command. eg sbatch [args]

where [args] –> –ntasks-per-node=2

Usage: eg

test0=ScriptJob(proj_dir)

runScript(*depJobArgs, job_dir='./', **kJobArgs)

The job_script must be able to be called from a different directory than where it will be running, such that it can access all job submission related files within the project dir

This method must perform two basic functions:
  1. make ‘non-blocking call’ submitting the simulation

  2. make a ‘blocking call’ that waits until simulation is done

  3. return this job_dir value for other functions

Args:

depJobArgs: Dask delayed functions that this function depends upon

job_dir: Name of job directory where calculation will take place

kJobArgs: keyword args for running this function eg

job_script: Name of job submission script (.sh or .pbs), job_output: Name of output file to check for completion, completion_tag: search string for denoting completion

Returns:

(str) this job directory

DaskManager

class chemistream.workflow.DaskManager(threadsPerWorker=5, nWorkers=1, processFlag=True)

Bases: object

Class to manager client threads and monitoring task submission

computeGraph(*args)

For a function that returns a final result, compute this answer and return

Args:

args: (eg args[0]: input Dask delayed function (default to class data))

Returns:

value of compututaion of task graph

runTasks(*args)

Run the tasks in the graph associated with the terminal node stored in this class object

Args:

args: variable length args (eg args[0]: input Dask delayed function (default to class data))

Returns:

iframe: progress bars

setTerminalNode(dnode)

Set the dask delayed function at the ‘terminal’ node of a task graph

Args:

dnode: Dask delayed function

showDashboard()

Show the client object dashboard info

showTasks(*args, rankDir='LR')

Show the task graph associated with the terminal node set in class object data

Args:

args: variable length args (eg args[0]: input Dask delayed function (default to class data))

DaskDelayedCalc

class chemistream.workflow.DaskDelayedCalc

Bases: object

Creates Dask delayed simulation objects based on run methods in derived classes from CalculationBase. Collection of dask task graph calculations

getDelayedCalc(nodeName, calcFunc, *depJobArgs, **kJobArgs)

Return a dask.delayed version of run calculation function eg an object derived from CalculationBase

Args:

nodeName: Name of job graph nodes

calcFunc: Name of calculation function (eg classObj.runScript)

depJobArgs: Dependent delayed functions (read by dask.delayed)

kJobArgs: Keyword arguments for running a single calc

Returns:

dask delayed function object

getDelayedFuncName(delayedFunc)

Return the node name of a Dask delayed function

Args:

delayedFunc a Dask delayed function created by this module

Returns:

string of node name

getDepNodeDirs(delayedFunc)

Get names of directories on which this function depends

Args:

delayedFunc a Dask delayed function created by this module

Returns:

list of strings of directory names

getDepNodeNames(delayedFunc)

Get node names of all dependent functions. NOTE: uses the stored terminal node for the entire graph

Args:

delayedFunc a Dask delayed function created by this module

Returns:

list of strings of node names

getInputJobDir(delayedFunc)

Get input job directory for the input delayed dask function

Args:

delayedFunc a Dask delayed function created by this module

Returns:

string name of directory

getNodeName(delayedFunc)

Get node name for the input delayed dask function

Args:

delayedFunc a Dask delayed function created by this module

Returns:

string name of job node