Chemistream API documentation¶
Chemistream Driver¶
ChemistreamDriver¶
-
class
chemistream.driver.
ChemistreamDriver
(user=None)¶ Bases:
makalii.makalii_driver.MakaliiDriver
Derived class from base class MakaliiDriver to augment and add functionality specific to Chemistream
- Constructor args:
user (str): user name (to select file settings)
password (str): Password to be set in remote containers
-
dumpConnectScript
()¶ Utility to be used in Chemistream Jupyter notebooks to dump a convenience script to connect to the remote cluster (Use in a terminal)
-
open_remote_chemistream_link
(local_port, remote_port)¶ Wrapper for opening remote html ssh tunnel. Checks for running remote JupyterLab server, then runs open_ssh_html_tunnel
- Args:
local_port (int): port on local machine for ssh tunnel
remote_port (int): port on remote machine running server
-
start_jsme
()¶ Start local JSME session in a separate tab
-
start_jupyterlab_server
(remote_port)¶ Startup remote jupyter server on remote port and run setup steps related to server start. If server is already running on remote port, this method returns without effecting remote server NOTE: jupyter etc must be in PATH on remote machine
- Args:
remote_port (int): Port for remote jupyter server to be running
Chemistream Analysis¶
ChemiVis¶
-
class
chemistream.analysis.
ChemiVis
¶ Bases:
object
Base class defining the interface to an object that wraps visualization components
-
aseToView
(grid)¶ Wrapper for aseLatticeToXYZ to generate string and then create image from ASE lattice object using NGLView
- Args:
grid - ASE lattice
- Returns:
displayed NGL view object
-
nwchemInputToImage
(input_file, targetIndex=0)¶ Parse NWChem input file and render NGLView image NOTE: looks for atoms between ‘geometry’ and ‘end’
- Args:
input_file: full path to input file
targetIndex: (default 0) index of molecule in file to image [0, 1,…]
- Return:
None
-
setSparksDumpInfo
(spFile)¶ Set a list of header dictionaries with all frame info for a SPPARKS dump file. This method searches for the line numbers where the search string ‘ITEM: TIMESTEP’ is located as the beginning of a frame of data at a particular time step
- Args:
spFile: path to sparks dump file
-
sparksDumpToImage
(iframe=0, xslice=None, yslice=None, zslice=None)¶ Convert frame of SPPARKS dump file to an image
- Args:
iframe (int) #-of-frame in SPPARKS dump file to image
xslice: max x-edge of sim box to viz
yslice: max y-edge of sim box to viz
zslice: max z-edge of sim box to viz
- Return:
nglview object
-
sparksDumpToImages
()¶ Driver for sparksDumpToImage. Displays menus for choosing #-of-atoms to display and frame index of loaded file to display
-
visBBFromJSON
(bb_file_list, ftype='xyz', bond_guess=1.5, numCols=2, sizePx=400)¶ Import STREAMM building block JSON file and visualize using ChemiVis object.
- Args:
bb_file_list (str): Name of the JSON file (full path from project dir)
ftype (str): Format type string ‘xyz’, ‘sdf’
bond_guess (float): Length to consider a bond (for sdf type)
numCols (int): #-of-columns in grid
sizePx (int): Size of grid (in px units)
-
xyzToGridView
(xyzStrList, numCols=2, sizePx=200)¶ Driver for xyzToNglView method to display a grid of ngl view objects. #-of-columns can be set and rows are automatic
- Args:
xyzStrList (list): list of XYZ format strings
numCols (int): number of columns for grid (default 2)
sizePx (int): Size of each frame in grid [px]
- Returns:
ipywidgets grid object
-
xyzToNglView
(xyzStr, sizeStr='300px')¶ Convert XYZ string directly to an nglview object
- Args:
xyzStr (str) XYZ string
sizeStr (str) size string in ‘px’
- Return:
nglview ‘view’ object
-
RDKitBase¶
-
class
chemistream.rdkitbase.
RDKitBase
(turnOffLogging)¶ Bases:
object
Specific wrapper class for RDKit encapsulating basic RDKit functionality utilized across Chemistream applications
-
checkMolImplHsMaxPerAtom
(mol)¶ Check molecule and return the largest #-impl hydrogens for any atom in the molecule
-
create_check_dir
(dirstring)¶ Utility to create a directory recursively if it does not exist
- Args:
dirstring (str): full directory path to create
-
generateAtomOrder
(mol)¶ Generate a SMILES string from a mol-object and use to re-order atoms within mol-object
-
molStrToObj
(mol_str)¶ Convert mol file to mol object Note: defaults to NOT removing hydrogens so reads the mol-string raw
- Args:
mol_str(str): mol string
Return: mol object
-
molToImageAndFile
(mol, full_file_name='mol.png', highlightAtoms=[], label_string='')¶ Output mol-object as a PNG file Exposes option to highlight a list of atoms in mol-object and label image
- Args:
mol: RDKit mol-object full_file_name(str): full path to image file highlightAtoms(list): list of atom indices in mol-object to highlight label_string(str): embedded label for image
Returns: a PIL image object
-
reorderMolObj
(mol)¶ Generate a SMILES string from a mol-object and use to re-order atoms within mol-object and return
- Args:
mol – RDKit mol-object
- Return
mol-object reordered with SMILES ordering
-
smilesToImage
(smiles, imageDir='.', skipDisplay=False)¶ Construct image and display and dump to PNG file with name of SMILES string used to generate
- Args:
smiles (str): SMILES string imageDir (str): Directory to dump corresponding PNG image skipDisplay (bool): Flag for showing image to screen
-
Chemistream Workflow¶
CalculationBase¶
-
class
chemistream.workflow.
CalculationBase
(project_dir='./')¶ Bases:
object
Base class for a calculation. Holds project directory info and basic setup,project-management methods
-
clearSimJobs
(fileTypes=['.log'])¶ Remove job output files in order to force re-submission
- Args:
fileTypes (list): of file suffixes to search for
-
ScriptCalc¶
-
class
chemistream.workflow.
ScriptCalc
(project_dir='./', comp_string='Total times', submitType='BASH', submitArgs='')¶ Bases:
chemistream.workflow.CalculationBase
Defines calculations as simulations launched by scripts that dump results to a file, all within a dedicated script directory
- NOTE: the main run functions can NOT depend on class data.
Once the class data is set for delayed functions it cannot be changed
- Constructor args:
project_dir (str): Top directory where all job submission directories are located
comp_string (str): String to check in log file for completion
- submitType (str): BASH, PBS or SLURM where
BASH – “bash [args] JOB_SCRIPT_FULL </dev/null &>/dev/null &” PBS – “qsub [args] JOB_SCRIPT_FULL ” SLURM – “sbatch [args] –chdir=JOB_DIR_FULL JOB_SCRIPT” and JOB_SCRIPT_FULL and JOB_SCRIPT get filled in by class methods.
- submitArgs (str): [args] to be sent to submission command. eg sbatch [args]
where [args] –> –ntasks-per-node=2
- Usage: eg
test0=ScriptJob(proj_dir)
-
runScript
(*depJobArgs, job_dir='./', **kJobArgs)¶ The job_script must be able to be called from a different directory than where it will be running, such that it can access all job submission related files within the project dir
- This method must perform two basic functions:
make ‘non-blocking call’ submitting the simulation
make a ‘blocking call’ that waits until simulation is done
return this job_dir value for other functions
- Args:
depJobArgs: Dask delayed functions that this function depends upon
job_dir: Name of job directory where calculation will take place
- kJobArgs: keyword args for running this function eg
job_script: Name of job submission script (.sh or .pbs), job_output: Name of output file to check for completion, completion_tag: search string for denoting completion
- Returns:
(str) this job directory
DaskManager¶
-
class
chemistream.workflow.
DaskManager
(threadsPerWorker=5, nWorkers=1, processFlag=True)¶ Bases:
object
Class to manager client threads and monitoring task submission
-
computeGraph
(*args)¶ For a function that returns a final result, compute this answer and return
- Args:
args: (eg args[0]: input Dask delayed function (default to class data))
- Returns:
value of compututaion of task graph
-
runTasks
(*args)¶ Run the tasks in the graph associated with the terminal node stored in this class object
- Args:
args: variable length args (eg args[0]: input Dask delayed function (default to class data))
- Returns:
iframe: progress bars
-
setTerminalNode
(dnode)¶ Set the dask delayed function at the ‘terminal’ node of a task graph
- Args:
dnode: Dask delayed function
-
showDashboard
()¶ Show the client object dashboard info
-
showTasks
(*args, rankDir='LR')¶ Show the task graph associated with the terminal node set in class object data
- Args:
args: variable length args (eg args[0]: input Dask delayed function (default to class data))
-
DaskDelayedCalc¶
-
class
chemistream.workflow.
DaskDelayedCalc
¶ Bases:
object
Creates Dask delayed simulation objects based on run methods in derived classes from CalculationBase. Collection of dask task graph calculations
-
getDelayedCalc
(nodeName, calcFunc, *depJobArgs, **kJobArgs)¶ Return a dask.delayed version of run calculation function eg an object derived from CalculationBase
- Args:
nodeName: Name of job graph nodes
calcFunc: Name of calculation function (eg classObj.runScript)
depJobArgs: Dependent delayed functions (read by dask.delayed)
kJobArgs: Keyword arguments for running a single calc
- Returns:
dask delayed function object
-
getDelayedFuncName
(delayedFunc)¶ Return the node name of a Dask delayed function
- Args:
delayedFunc a Dask delayed function created by this module
- Returns:
string of node name
-
getDepNodeDirs
(delayedFunc)¶ Get names of directories on which this function depends
- Args:
delayedFunc a Dask delayed function created by this module
- Returns:
list of strings of directory names
-
getDepNodeNames
(delayedFunc)¶ Get node names of all dependent functions. NOTE: uses the stored terminal node for the entire graph
- Args:
delayedFunc a Dask delayed function created by this module
- Returns:
list of strings of node names
-
getInputJobDir
(delayedFunc)¶ Get input job directory for the input delayed dask function
- Args:
delayedFunc a Dask delayed function created by this module
- Returns:
string name of directory
-
getNodeName
(delayedFunc)¶ Get node name for the input delayed dask function
- Args:
delayedFunc a Dask delayed function created by this module
- Returns:
string name of job node
-