Basic Examples

Creating Building Blocks

There are a number of ways to create BuildingBlock molecules, but the two most common ways are direct creation through SMILES strings and loading them from molecule structure files.

import stk

bb1 = stk.BuildingBlock('NCCCN')
bb2 = stk.BuildingBlock.init_from_file('path/to/file.mol')

Look at the documentation of BuildingBlock to see other available initialization methods.

Constructing Molecules

To construct molecules, you need to create a new ConstructedMolecule. The required input consists of building block molecules and a topology graph. BuildingBlock molecules which undergo reactions during construction need to have the functional groups which undergo them specified

import stk

# React the amine functional group during construction.
bb1 = stk.BuildingBlock('NCCN', ['amine'])
# React the aldehyde functional group during construction.
bb2 = stk.BuildingBlock('O=CC(C=O)C=O', ['aldehyde'])
# Build a bunch of cages, each has the same building blocks but
# different topology graphs.
cage = stk.ConstructedMolecule(
    building_blocks=[bb1, bb2],
    topology_graph=stk.cage.FourPlusSix()
)
cage2 = stk.ConstructedMolecule(
    building_blocks=[bb1, bb2],
    topology_graph=stk.cage.EightPlusTwelve()
)
cage3 = stk.ConstructedMolecule(
    building_blocks=[bb1, bb2],
    topology_graph=stk.cage.TwentyPlusThirty()
)

Created ConstructedMolecule instances can be used as building blocks for others

guest = stk.BuildingBlock('BrBr')
host_guest_complex = stk.ConstructedMolecule(
    building_blocks=[cage, guest],
    topology_graph=stk.host_guest.Complex()
)

Look at the documentation of the topology graph to see any requirements associated with building ConstructedMolecule instances with it as well as usage examples.

Using Multiple Building Blocks

You can have as many different building blocks in any ConstructedMolecule as there are vertices, however you will often have to assign them manually. This is because there are often multiple equally valid ways to place the building blocks on the vertrices and stk has no way to figure out which one you want.

import stk

bb1 = stk.BuildingBlock('O=CC(C=O)C=O', ['aldehyde'])
bb2 = stk.BuildingBlock('O=CC(Cl)(C=O)C=O', ['aldehyde'])
bb3 = stk.BuildingBlock('NCCN', ['amine'])
bb4 = stk.BuildingBlock('NCC(Cl)N', ['amine'])
bb5 = stk.BuildingBlock('NCCCCN', ['amine'])

tetrahedron = stk.cage.FourPlusSix()
cage = stk.ConstructedMolecule(
    building_blocks=[bb1, bb2, bb3, bb4, bb5],
    topology_graph=tetrahedron,
    building_block_vertices={
        bb1: tetrahedron.vertices[:2],
        bb2: tetrahedron.vertices[2:4],
        bb3: tetrahedron.vertices[4:5],
        bb4: tetrahedron.vertices[5:6],
        bb5: tetrahedron.vertices[6:]
    }
)

Optimizing Molecular Structures

Molecules used by stk can be optimized both before and after construction. Optimization is performed by optimizers, which implement the optimize() method.

import stk

# Optimize with the MMFF force field.
mmff = stk.MMFF()
bb = stk.BuildingBlock('BrCCBr', ['bromine'])
mmff.optimize(bb)

polymer = stk.ConstructedMolecule(
    building_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 15)
)
mmff.optimize(polymer)

# Optimize with UFF.
uff = stk.UFF()
uff.optimize(bb)
uff.optimize(polymer)

All optimizers support the use_cache option, which prevents a second optimization from being run on the same molecule twice

mmff = stk.MMFF()
bb = stk.BuildingBlock('CCCC')
# Runs an optimization.
mmff.optimize(bb)
# Run an optimization again.
mmff.optimize(bb)

caching_mmff = stk.MMFF(use_cache=True)
# Runs an optimization.
caching_mmff.optimize(bb)
# Does not run an optimization, returns immediately.
caching_mmff.optimize(bb)

An important optimizer to take note of is the OptimizerSequence, which allows you to chain multiple optimizers in one go. For example, you may wish to embed a molecule with the ETKDG algorithm before running a UFF optimization

sequence = stk.OptimizerSequence(
    stk.ETKDG(),
    stk.UFF()
)
# Embed with ETKDG then do a UFF optimization.
sequence.optimize(bb)

Calculating Molecular Energy

The energy of molecules can be calculated with energy calculators.

All energy calculators define the get_energy() method, which is used to calculate the energy

import stk

mmff = stk.MMFFEnergy()
bb = stk.BuildingBlock('BrCCBr', ['bromine'])
bb_energy = mmff.get_energy(bb)

polymer = stk.ConstructedMolecule(
    building_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 15)
)
polymer_energy = mmff.get_energy(polymer)

Much like optimizers, energy calculators support a use_cache option. If this is turned on the energy calculator will not calculate the energy of a molecule twice. Instead, if the same molecule is passed a second time, the previous value will be returned from memory

mmff = stk.MMFF()
caching_mmff = stk.MMFFEnergy(use_cache=True)
bb_energy = caching_mmff.get_energy(bb)
mmff.optimize(bb)
# bb_energy2 is equal to bb_energy even though the structure
# changed, since the old value was returned from memory.
bb_energy2 = caching_mmff.get_energy(bb)

Using Multiple Functional Groups

Sometimes you may have a BuildingBlock where multiple functional groups should be used during construction. To allow this you can pass the names of any number of functional groups to the BuildingBlock

import stk

bb = stk.BuildingBlock('ICCBr', ['bromine', 'iodine'])
polymer = stk.ConstructedMolecule(
    bulding_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 10)
)

Removing Extra Functional Groups

Some building blocks may have too many functional groups to be used with a topology graph. For example a Linear topology graph requires two functional groups on a BuildingBlock. If there are more than two it is not clear which of the functional groups should be used to form bonds during construction, as they are all equivalent to stk. However, it is possible to resolve this ambguity by removing the extra functional groups

import stk

# This buildingb block has 5 functional groups.
bb = stk.BuildingBlock('BrCC(Br)CC(Br)CC(Br)CC(Br)CC', ['bromine'])
# Keep only the first 2 functional groups.
bb.func_groups = bb.func_groups[:2]
# bb can now be used to construct linear polymers.
polymer = stk.ConstructedMolecule(
    bulding_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 10)
)

Constructing Molecules in Bulk

If you have a bunch of building blocks and topologies and you want to create all possible ConstructedMolecule instances, use Population.init_all().

If you want to create a random population of a finite size, use Population.init_random().

Optimizing Molecules in Bulk

To optimize a bunch of molecules at the same time, in parallel across your CPU cores, place them into a Population

import stk

bb1 = stk.BuildingBlock('NCCCN')
bb2 = stk.BuildingBlock('BrCCBr', ['bromine'])
polymer = stk.ConstructedMolecule(
    building_blocks=[bb2],
    topology_graph=stk.polymer.Linear('A', 12)
)
pop = stk.Population(bb1, bb2, polymer)

mmff = stk.MMFF()
# Optimization done on all molecules in parallel.
pop.optimize(mmff)

Converting between rdkit Molecules

rdkit is a popular and powerful cheminformatics library which provides many useful tools for molecular analysis. To take advantage of it stk can convert its molecules into those of rdkit

import stk

bb = stk.BuildingBlock('ICCBr', ['bromine', 'iodine'])
bb_rdkit = bb.to_rdkit_mol()

polymer = stk.ConstructedMolecule(
    bulding_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 10)
)
polymer_rdkit = polymer.to_rdkit_mol()

stk also allows you to update the structure of molecules from rdkit molecules

# Update structure of bb to match structure of bb_rdkit.
bb.update_from_rdkit_mol(bb_rdkit)
# Update structure of polymer to match structure of polymer_rdkit.
polymer.update_from_rdkit_mol(polymer_rdkit)

Finally, stk allows initialization of new building blocks from rdkit molecules directly

bb2 = stk.BuildingBlock.init_from_rdkit_mol(bb_rdkit)

Saving Molecules

The simplest way to save molecules is to write them to a file

import stk

bb = stk.BuildingBlock('ICCBr', ['bromine', 'iodine'])
bb.write('bb.mol')

polymer = stk.ConstructedMolecule(
    bulding_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 10)
)
polymer.write('polymer.mol')

However, writing to regular chemical file format is lossy. This is because BuildingBlock and ConstructedMolecule contain data that is not held by these files. BuildingBlock holds information about which functional groups are to be used during construction and ConstructedMolecule holds information about which building blocks molecules and topology graph were used to construct it. While a BuildingBlock can be initialized from chemical file formats a ConstructedMolecule cannot, as there is no way to recover the information regarding building blocks and the topology graph.

If you wish to be able to fully recover BuildingBlock and ConstructedMolecule instances, you can write them as JSON files

bb.dump('bb.dump')
polymer.dump('polymer.dump')

These can then be loaded in a later session

recovered_bb = stk.Molecule.load('bb.dump')
recovered_polymer = stk.Molecule.load('polymer.dump')

Using the Molecule.load() method means you do not have to know if the molecule in the file was a BuildingBlock or a ConstructedMolecule, the correct class be determined for you automatically.

You can write and dump molecules in bulk with a Population

pop = stk.Population(bb, polymer)
pop.write('my_pop')
pop.dump('population.dump')

and load it too

recovered_pop = stk.Population.load('population.dump')

Caching

A powerful but dangerous feature of stk use the molecular cache. Every constructor for a BuildingBlock or a ConstructedMolecule has a use_cache option. If this is set to True and an identical molecule has already been loaded or constructed by stk, then stk will not create a new instance but return the existing one from memory. In the case of BuildingBlock it can mean that the memory footprint is dramatically reduced, while in the case of ConstructedMolecule it means that expensive constructions do not have to be repeated.

import stk

bb = stk.BuildingBlock('BrCCBr', ['bromine'])
# bb and bb2 are separate instances because when bb was created it
# was not added to the cache.
bb2 = stk.BuildingBlock('BrCCBr', ['bromine'], use_cache=True)
# bb2 and bb3 are different names for the same object, as
# both bb2 and bb3 had use_cache set to True.
bb3 = stk.BuildingBlock('BrCCBr', ['bromine'], use_cache=True)
# bb4 is a completely new instance.
bb4 = stk.BuildingBlock('BrCCBr', ['bromine'])


polymer = stk.ConstructedMolecule(
    building_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 5)
)
# polymer2 is a separate instance.
polymer2 = stk.ConstructedMolecule(
    building_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 5),
    use_cache=True
)
# polymer2 and polymer3 are different names for the same object as
# both polymer2 and polymer3 had use_cache set to True.
# No construction was carried out when making polymer3, it was
# returned directly from memory.
polymer3 = stk.ConstructedMolecule(
    building_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 5),
    use_cache=True
)
# polymer4 is a completely new instance, construction was
# done from scratch.
polymer4 = stk.ConstructedMolecule(
    building_blocks=[bb],
    topology_graph=stk.polymer.Linear('A', 5)
)