Building Block

class BuildingBlock(smiles, functional_groups=None, random_seed=4, use_cache=False)

Bases: stk.molecular.molecules.molecule.Molecule

Represents a building block of a ConstructedMolecule.

A BuildingBlock can represent either an entire molecule or a molecular fragments used to construct a ConstructedMolecule. The building block uses func_groups to identify which atoms are modified during construction. Available functional group types can be seen in functional_group_types. Additional functional groups can be added at runtime by adding a FGType instance into stk.fg_types. See Adding Functional Groups for an example.

atoms

The atoms of the molecule.

Type

tuple of Atom

bonds

The bonds of the molecule.

Type

tuple of Bond

func_groups

The functional groups present in the molecule. The id of a FunctionalGroup is its index.

Type

tuple of FunctionalGroup

Methods

apply_displacement(self, displacement)

Shift the centroid by displacement.

apply_rotation_about_axis(self, angle, axis, …)

Rotate by angle about axis on the origin.

apply_rotation_between_vectors(self, start, …)

Rotate by a rotation from start to target.

apply_rotation_to_minimize_angle(self, …)

Rotate to minimize the angle between start and target.

clone(self)

Return a clone.

dump(self, path[, include_attrs, …])

Write a dict representation to a file.

get_atom_distance(self, atom1_id, atom2_id)

Return the distance between 2 atoms.

get_atom_positions(self[, atom_ids])

Yield the positions of atoms.

get_bonder_centroids(self[, fg_ids])

Yield the centroids of bonder atoms.

get_bonder_direction_vectors(self[, fg_ids])

Yield the direction vectors between bonder centroids.

get_bonder_distances(self[, fg_ids])

Yield distances between pairs of bonder centroids.

get_bonder_ids(self[, fg_ids])

Yield ids of bonder atoms.

get_bonder_plane(self[, fg_ids])

Return coeffs of the plane formed by the bonder centroids.

get_bonder_plane_normal(self[, fg_ids])

Return the normal to the plane formed by bonder centroids.

get_cached_mol(identity_key[, default])

Get a molecule from the cache.

get_center_of_mass(self[, atom_ids])

Return the centre of mass.

get_centroid(self[, atom_ids])

Return the centroid.

get_centroid_centroid_direction_vector(self)

Return the direction vector between the 2 molecular centroids.

get_direction(self[, atom_ids])

Return a vector of best fit through the atoms.

get_identity_key(self)

Return the identity key.

get_maximum_diameter(self[, atom_ids])

Return the maximum diamater.

get_plane_normal(self[, atom_ids])

Return the normal to the plane of best fit.

get_position_matrix(self)

Return a matrix holding the atomic positions.

has_cached_mol(identity_key)

True if molecule with identity_key is cached.

init_from_dict(mol_dict[, use_cache])

Initialize from a dict representation.

init_from_file(path[, functional_groups, …])

Initialize from a file.

init_from_molecule(mol[, functional_groups, …])

Initialize from a Molecule.

init_from_random_file(file_glob[, …])

Initialize from a random file in file_glob.

init_from_rdkit_mol(mol[, …])

Initialize from an rdkit molecule.

load(path[, use_cache])

Initialize from a dump file.

set_centroid(self, position[, atom_ids])

Set the centroid to position.

set_position_matrix(self, position_matrix)

Set the coordinates to those in position_matrix.

to_dict(self[, include_attrs, …])

Return a dict representation.

to_rdkit_mol(self)

Return an rdkit representation.

update_cache(self)

Update attributes of the cached molecule.

update_from_file(self, path)

Update the structure from a file.

update_from_rdkit_mol(self, mol)

Update the structure to match mol.

write(self, path[, atom_ids])

Write the structure to a file.

__init__(self, smiles, functional_groups=None, random_seed=4, use_cache=False)

Initialize from a SMILES string.

Notes

The molecule is given 3D coordinates with rdkit.ETKDGv2().

Parameters
  • smiles (str) – A SMILES string of the molecule.

  • functional_groups (iterable of str, optional) – The names of the functional group types which are to be added to func_groups. If None, then no functional groups are added.

  • random_seed (int, optional) – Random seed passed to rdkit.ETKDGv2()

  • use_cache (bool, optional) – If True, a new BuildingBlock will not be made if a cached and identical one already exists, the one which already exists will be returned. If True and a cached, identical BuildingBlock does not yet exist the created one will be added to the cache.

apply_displacement(self, displacement)

Shift the centroid by displacement.

Parameters

displacement (numpy.ndarray) – A displacement vector applied to the molecule.

Returns

The molecule.

Return type

Molecule

apply_rotation_about_axis(self, angle, axis, origin)

Rotate by angle about axis on the origin.

Parameters
  • angle (float) – The size of the rotation in radians.

  • axis (numpy.ndarray) – The axis about which the rotation happens.

  • origin (numpy.ndarray) – The origin about which the rotation happens.

Returns

The molecule.

Return type

Molecule

apply_rotation_between_vectors(self, start, target, origin)

Rotate by a rotation from start to target.

Given two direction vectors, start and target, this method applies the rotation required transform start to target onto the molecule. The rotation occurs about the origin.

For example, if the start and target vectors are 45 degrees apart, a 45 degree rotation will be applied to the molecule. The rotation will be along the appropriate direction.

The great thing about this method is that you as long as you can associate a geometric feature of the molecule with a vector, then the molecule can be rotated so that this vector is aligned with target. The defined vector can be virtually anything. This means that any geometric feature of the molecule can be easily aligned with any arbitrary axis.

Parameters
  • start (numpy.ndarray) – A vector which is to be rotated so that it transforms into the target vector.

  • target (numpy.ndarray) – The vector onto which start is rotated.

  • origin (numpy.ndarray) – The point about which the rotation occurs.

Returns

The molecule.

Return type

Molecule

apply_rotation_to_minimize_angle(self, start, target, axis, origin)

Rotate to minimize the angle between start and target.

Note that this function will not necessarily overlay the start and target vectors. This is because the possible rotation is restricted to the axis.

Parameters
  • start (numpy.ndarray) – The vector which is rotated.

  • target (numpy.ndarray) – The vector which is stationary.

  • axis (numpy.ndarray) – The vector about which the rotation happens.

  • origin (numpy.ndarray) – The origin about which the rotation happens.

Returns

The molecule.

Return type

Molecule

clone(self)

Return a clone.

Returns

The clone.

Return type

BuildingBlock

dump(self, path, include_attrs=None, ignore_missing_attrs=False)

Write a dict representation to a file.

Parameters
  • path (str) – The full path to the file to which the dict should be written.

  • include_attrs (list of str, optional) – The names of attributes of the molecule to be added to the representation. Each attribute is saved as a string using repr().

  • ignore_missing_attrs (bool, optional) – If False and an attribute in include_attrs is not held by the Molecule, an error will be raised.

Returns

None

Return type

NoneType

functional_group_types = ['amine', 'primary_amine', 'aldehyde', 'carboxylic_acid', 'amide', 'thioacid', 'alcohol', 'thiol', 'bromine', 'iodine', 'alkyne', 'terminal_alkene', 'boronic_acid', 'amine2', 'secondary_amine', 'diol', 'difluorene', 'dibromine', 'alkyne2', 'ring_amine']
get_atom_distance(self, atom1_id, atom2_id)

Return the distance between 2 atoms.

This method does not account for the van der Waals radius of atoms.

Parameters
  • atom1_id (int) – The id of the first atom.

  • atom2_id (int) – The id of the second atom.

Returns

The distance between the first and second atoms.

Return type

float

get_atom_positions(self, atom_ids=None)

Yield the positions of atoms.

Parameters

atom_ids (iterable of int, optional) – The ids of the atoms whose positions are desired. If None, then the positions of all atoms will be yielded.

Yields

numpy.ndarray – The x, y and z coordinates of an atom.

get_bonder_centroids(self, fg_ids=None)

Yield the centroids of bonder atoms.

A bonder centroid is the centroid of all bonder atoms in a particular functional group.

Parameters

fg_ids (iterable of int) – The ids of functional groups to be used. The bonder centroids will be yielded in this order. If None then all functional groups are used and centroids are yielded in ascending order of functional group id.

Yields

numpy.ndarray – The centroid of a functional group.

get_bonder_direction_vectors(self, fg_ids=None)

Yield the direction vectors between bonder centroids.

A bonder centroid is the centroid of all bonder atoms in a particular functional group.

Parameters

fg_ids (iterable of int) – The ids of functional groups to be used. If None then all functional groups are used.

Yields

tuple – They yielded tuple has the form

(3, 54, np.array([12.2, 43.3, 9.78]))

The first two elements of the tuple represent the ids of the start and end fgs of the vector, respectively. The array is the direction vector running between the functional group positions.

get_bonder_distances(self, fg_ids=None)

Yield distances between pairs of bonder centroids.

A bonder centroid is the centroid of all bonder atoms in a particular functional group.

Parameters

fg_ids (iterable of int) – The ids of functional groups to be used. If None then all functional groups are used.

Yields

tuple – A tuple of the form (3, 54, 12.54). The first two elements are the ids of the involved functional groups and the third element is the distance between them.

get_bonder_ids(self, fg_ids=None)

Yield ids of bonder atoms.

Parameters

fg_ids (iterable of int) – The ids of functional groups whose bonder atoms should be yielded. If None then all bonder atom ids in the BuildingBlock will be yielded.

Yields

int – The id of a bonder atom.

get_bonder_plane(self, fg_ids=None)

Return coeffs of the plane formed by the bonder centroids.

A bonder centroid is the centroid of all bonder atoms in a particular functional group.

A plane is defined by the scalar plane equation:

ax + by + cz = d.

This method returns the a, b, c and d coefficients of this equation for the plane formed by the bonder centroids. The coefficents a, b and c describe the normal vector to the plane. The coefficent d is found by substituting these coefficients along with the x, y and z variables in the scalar equation and solving for d. The variables x, y and z are substituted by the coordinates of some point on the plane. For example, the position of one of the bonder centroids.

Parameters

fg_ids (iterable of int) – The ids of functional groups used to construct the plane. If there are more than three, a plane of best fit through the bonder centroids of the functional groups will be made. If None, all functional groups in the BuildingBlock will be used.

Returns

This array has the form [a, b, c, d] and represents the scalar equation of the plane formed by the bonder centroids.

Return type

numpy.ndarray

References

https://tinyurl.com/okpqv6

get_bonder_plane_normal(self, fg_ids=None)

Return the normal to the plane formed by bonder centroids.

A bonder centroid is the centroid of all bonder atoms in a particular functional group.

The normal of the plane is defined such that it goes in the direction toward the centroid of the molecule.

Parameters

fg_ids (iterable of int, optional) – The ids of functional groups used to construct the plane. If there are more than three, a plane of best fit through the bonder centroids of the functional groups will be made. If None, all functional groups in the BuildingBlock will be used.

Returns

A unit vector which describes the normal to the plane of the bonder centroids.

Return type

numpy.ndarray

Raises

ValueError – If there are not at least 3 functional groups, which is necessary to define a plane.

classmethod get_cached_mol(identity_key, default=None)

Get a molecule from the cache.

Parameters
  • identity_key (object) – The identity key of the molecule to return.

  • default (object, optional) – Returned if identity_key is not found in the cache. If None an error will be raised if identity_key is not found in the cache.

Returns

The cached molecule.

Return type

Molecule

get_center_of_mass(self, atom_ids=None)

Return the centre of mass.

Parameters

atom_ids (iterable of int, optional) – The ids of atoms which should be used to calculate the center of mass. If None, then all atoms will be used.

Returns

The coordinates of the center of mass.

Return type

numpy.ndarray

References

https://en.wikipedia.org/wiki/Center_of_mass

get_centroid(self, atom_ids=None)

Return the centroid.

Parameters

atom_ids (iterable of int, optional) – The ids of atoms which are used to calculate the centroid. If None, then all atoms will be used.

Returns

The centroid of atoms specified by atom_ids.

Return type

numpy.ndarray

get_centroid_centroid_direction_vector(self, fg_ids=None)

Return the direction vector between the 2 molecular centroids.

The first molecular centroid is the centroid of the entire molecule. The second molecular centroid is the of the bonder atoms in the molecule.

Parameters

fg_ids (iterable of int) – The ids of functional groups to be used for calculating the bonder centroid. If None then all functional groups are used.

Returns

The vector running from the centroid of the bonder atoms to the molecular centroid.

Return type

numpy.ndarray

get_direction(self, atom_ids=None)

Return a vector of best fit through the atoms.

Parameters

atom_ids (iterable of int, optional) – The ids of atoms which should be used to calculate the vector. If None, then all atoms will be used.

Returns

The vector of best fit.

Return type

numpy.ndarray

get_identity_key(self)

Return the identity key.

The identity key wil be equal for two molecules which stk sees as identical. The identity key does not take the conformation into account but it does account for isomerism.

Returns

A hashable object which represents the identity of the molecule.

Return type

object

get_maximum_diameter(self, atom_ids=None)

Return the maximum diamater.

This method does not account for the van der Waals radius of atoms.

Parameters

atom_ids (iterable of int) – The ids of atoms which are considered when looking for the maximum diamater. If None then all atoms in the molecule are considered.

Returns

The maximum diameter in the molecule.

Return type

float

get_plane_normal(self, atom_ids=None)

Return the normal to the plane of best fit.

Parameters

atom_ids (iterable of int, optional) – The ids of atoms which should be used to calculate the plane. If None, then all atoms will be used.

Returns

Vector orthonormal to the plane of the molecule.

Return type

numpy.ndarray

get_position_matrix(self)

Return a matrix holding the atomic positions.

Returns

The array has the shape (n, 3). Each row holds the x, y and z coordinates of an atom.

Return type

numpy.ndarray

classmethod has_cached_mol(identity_key)

True if molecule with identity_key is cached.

Parameters

identity_key (object) – The identity key of a molecule.

Returns

True if a molecule with identity_key is cached.

Return type

bool

classmethod init_from_dict(mol_dict, use_cache=False)

Initialize from a dict representation.

The Molecule returned has the class specified in mol_dict, not Molecule.

Parameters
  • mol_dict (dict) – A dict holding the dict representation of a molecule, generated by to_dict().

  • use_cache (bool, optional) – If True, a new instance will not be made if a cached and identical one already exists, the one which already exists will be returned. If True and a cached, identical instance does not yet exist the created one will be added to the cache.

Returns

The molecule represented by mol_dict.

Return type

Molecule

classmethod init_from_file(path, functional_groups=None, use_cache=False)

Initialize from a file.

Parameters
  • path (str) –

    The path to a molecular structure file. Supported file types are:

    1. .mol, .sdf - MDL V3000 MOL file

    2. .pdb - PDB file

  • functional_groups (iterable of str, optional) – The names of the functional group types which are to be added to func_groups. If None, then no functional groups are added.

  • use_cache (bool, optional) – If True, a new BuildingBlock will not be made if a cached and identical one already exists, the one which already exists will be returned. If True and a cached, identical BuildingBlock does not yet exist the created one will be added to the cache.

Returns

The building block.

Return type

BuildingBlock

Raises

ValueError – If the file type cannot be used for initialization.

classmethod init_from_molecule(mol, functional_groups=None, use_cache=False)

Initialize from a Molecule.

Parameters
  • mol (Molecule) – The molecule to initialize from. This can be a any Molecule, such a BuildingBlock or a ConstructedMolecule.

  • functional_groups (iterable of str, optional) – The names of the functional group types which are to be added to func_groups. If None, then no functional groups are added.

  • use_cache (bool, optional) – If True, a new BuildingBlock will not be made if a cached and identical one already exists, the one which already exists will be returned. If True and a cached, identical BuildingBlock does not yet exist the created one will be added to the cache.

Returns

The building block. It will have the same atoms, bonds and atomic positions as mol.

Return type

BuildingBlock

classmethod init_from_random_file(file_glob, functional_groups=None, random_seed=None, use_cache=False)

Initialize from a random file in file_glob.

Parameters
  • file_glob (str) – A glob specifying files, one of which is used to initialize a BuildingBlock at random.

  • functional_groups (iterable of str, optional) – The names of the functional group types which are to be added to func_groups. If None, then no functional groups are added.

  • random_seed (int, optional) – The random seed to use.

  • use_cache (bool, optional) – If True, a new BuildingBlock will not be made if a cached and identical one already exists, the one which already exists will be returned. If True and a cached, identical BuildingBlock does not yet exist the created one will be added to the cache.

Returns

A random molecule from file_glob.

Return type

BuildingBlock

Raises

RuntimeError – If no files in file_glob could be initialized from.

classmethod init_from_rdkit_mol(mol, functional_groups=None, use_cache=False)

Initialize from an rdkit molecule.

Parameters
  • mol (rdkit.Mol) – The molecule.

  • functional_groups (iterable of str, optional) – The names of the functional group types which are to be added to func_groups. If None, then no functional groups are added.

  • use_cache (bool, optional) – If True, a new BuildingBlock will not be made if a cached and identical one already exists, the one which already exists will be returned. If True and a cached, identical BuildingBlock does not yet exist the created one will be added to the cache.

Returns

The molecule.

Return type

BuildingBlock

classmethod load(path, use_cache=False)

Initialize from a dump file.

The Molecule returned has the class specified in in the file, not Molecule. You can use this if you don’t know what class the instance in the loaded molecule is or should be.

Parameters
  • path (str) – The full path holding a dumped molecule.

  • use_cache (bool, optional) – If True, a new instance will not be made if a cached and identical one already exists, the one which already exists will be returned. If True and a cached, identical instance does not yet exist the created one will be added to the cache.

Returns

The molecule held in path.

Return type

Molecule

set_centroid(self, position, atom_ids=None)

Set the centroid to position.

Parameters
  • position (numpy.ndarray) – This array holds the position on which the centroid of the molecule is going to be placed.

  • atom_ids (iterable of int) – The ids of atoms which should have their centroid set to position. If None then all atoms are used.

Returns

The molecule.

Return type

Molecule

set_position_matrix(self, position_matrix)

Set the coordinates to those in position_matrix.

Parameters

position_matrix (numpy.ndarray) – A position matrix of the molecule. The shape of the matrix is (n, 3).

Returns

The molecule.

Return type

Molecule

to_dict(self, include_attrs=None, ignore_missing_attrs=False)

Return a dict representation.

Parameters
  • include_attrs (list of str, optional) – The names of additional attributes of the molecule to be added to the dict. Each attribute is saved as a string using repr().

  • ignore_missing_attrs (bool, optional) – If False and an attribute in include_attrs is not held by the BuildingBlock, an error will be raised.

Returns

A dict which represents the molecule.

Return type

dict

to_rdkit_mol(self)

Return an rdkit representation.

Returns

The molecule in rdkit format.

Return type

rdkit.Mol

update_cache(self)

Update attributes of the cached molecule.

If there is no identical molecule in the cache, then this molecule is added.

When using multiprocessing, modified copies of the original molecules are created. In order to ensure that the cached molecules have their attributes updated to the values of the copies, this method must be run on the copies.

Returns

None

Return type

NoneType

update_from_file(self, path)

Update the structure from a file.

Multiple file types are supported, namely:

  1. .mol, .sdf - MDL V2000 and V3000 files

  2. .xyz - XYZ files

  3. .mae - Schrodinger Maestro files

  4. .coord - Turbomole files

Parameters

path (str) – The path to a molecular structure file holding updated coordinates for the Molecule.

Returns

The molecule.

Return type

Molecule

update_from_rdkit_mol(self, mol)

Update the structure to match mol.

Parameters

mol (rdkit.Mol) – The rdkit molecule to use for the structure update.

Returns

The molecule.

Return type

Molecule

write(self, path, atom_ids=None)

Write the structure to a file.

This function will write the format based on the extension of path.

  1. .mol, .sdf - MDL V3000 MOL file

  2. .xyz - XYZ file

  3. .pdb - PDB file

Parameters
  • path (str) – The path to which the molecule should be written.

  • atom_ids (iterable of int, optional) – The atom ids of atoms to write. If None then all atoms are written.

Returns

The molecule.

Return type

Molecule