Populations

Population objects are container specialized to perform operations on groups of stk molecules, often in parallel.

class EAPopulation(*args)

Bases: stk.populations.Population

A population which also stores fitness values of molecules.

direct_members

Held here are direct members of the Population. In other words, these are the molecules not held by any subpopulations. As a result, not all members of a Population are stored in this attribute.

Type

list of Molecule

subpopulations

A list holding the subpopulations.

Type

list of Population

Methods

add_members(self, molecules[, duplicate_key])

Add Molecule instances to the Population.

add_subpopulation(self, population)

Add a clone of population to subpopulations.

clone(self)

Return a clone.

close_process_pool(self)

Close an open process pool.

dump(self, path[, include_attrs, …])

Dump the Population to a file.

get_fitness_values(self)

Return the fitness values of molecules.

init_all(building_blocks, topology_graphs[, …])

Make all possible molecules from groups of building blocks.

init_diverse(building_blocks, …[, …])

Construct a chemically diverse Population.

init_from_list(pop_list[, use_cache])

Initialize a population from a list representation.

init_random(building_blocks, …[, …])

Construct molecules for a random Population.

load(path[, use_cache])

Initialize a Population from one dumped to a file.

open_process_pool(self[, num_processes])

Open a process pool.

optimize(self, optimizer[, num_processes])

Optimize the structures of molecules in the population.

remove_duplicates(self[, …])

Remove duplicates from the population.

remove_members(self, key)

Remove all members where key(member) is True.

set_fitness_values_from_calculators(self, …)

Set the fitness values of molecules.

set_fitness_values_from_dict(self, …)

Set the fitness values of molecules.

set_mol_ids(self, n[, overwrite])

Give each member of the population an id starting from n.

to_list(self[, include_attrs, …])

Convert the population to a list representation.

write(self, path)

Write the .mol files of members to a directory.

__init__(self, *args)

Initialize a Population.

Parameters

*args (Molecule, Population) – A population is initialized with the Molecule and Population instances it should hold.

Examples

bb1 = stk.BuildingBlock('CCC')
bb2 = stk.BuildingBlock('NCCNCNC')
bb3 = stk.BuildingBlock('[Br]CCC[Br]')
pop1 = stk.Population(bb1, bb2, bb3)

bb4 = stk.BuildingBlock('NNCCCN')
# pop2 has pop1 as a subpopulation and bb4 as a direct
# member.
pop2 = stk.Population(pop1, bb4)
add_members(self, molecules, duplicate_key=None)

Add Molecule instances to the Population.

The added Molecule instances are added as direct members of the population, they are not placed into any subpopulations.

Parameters
  • molecules (iterable of Molecule) – The molecules to be added as direct members.

  • duplicate_key (callable, optional) – If not None, duplicate_key(mol) is evalued on each molecule in members. If a molecule with the same duplicate_key is already present in the population, the molecule is not added.

Returns

None

Return type

NoneType

add_subpopulation(self, population)

Add a clone of population to subpopulations.

Only a clone of the population container is made. The molecules it holds are not copies.

Parameters

population (Population) – The population to be added as a subpopulation.

Returns

None

Return type

NoneType

clone(self)

Return a clone.

The clone will share the Molecule objects, copies of Molecule objects will not be made.

Returns

The clone.

Return type

Population

Examples

import stk

# Make an intial population.
pop = stk.Population(stk.BuildingBlock('NCCN'))
# Make a clone.
clone = pop.clone()
close_process_pool(self)

Close an open process pool.

Returns

The population.

Return type

Population

dump(self, path, include_attrs=None, ignore_missing_attrs=False)

Dump the Population to a file.

Parameters
  • path (str) – The full path of the file to which the Population should be dumped.

  • include_attrs (list of str, optional) – The names of attributes of the molecules to be added to the JSON. Each attribute is saved as a string using repr().

  • ignore_missing_attrs (bool, optional) – If False and an attribute in include_attrs is not held by a Molecule, an error will be raised.

Returns

None

Return type

NoneType

get_fitness_values(self)

Return the fitness values of molecules.

Returns

Maps a Molecule to its fitness value.

Return type

dict

classmethod init_all(building_blocks, topology_graphs, num_processes=None, duplicates=False, use_cache=False)

Make all possible molecules from groups of building blocks.

Parameters
  • building_blocks (list of Molecule) –

    A list holding nested building blocks, for example

    bbs1 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    bbs2 = [
        stk.ConstructedMolecule(...),
        stk.BuildingBlock(...),
        ...,
    ]
    bbs3 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    building_blocks = [bbs1, bbs2, bbs3]
    

    To construct a new ConstructedMolecule, a Molecule is picked from each of the sublists in building_blocks. The picked Molecule instances are then supplied to ConstructedMolecule

    # mol is a new ConstructedMolecule. bb1 is selected
    # from bbs1, bb2 is selected from bbs2 and bb3 is
    # selected from bbs3.
    mol = stk.ConstructedMolecule(
        building_blocks=[bb1, bb2, bb3],
        topology_graph=topology_pick
    )
    

    The order a Molecule instance is given to the ConstructedMolecule is determined by the sublist of building_blocks it was picked from. Note that the number of sublists in building_blocks is not fixed. It merely has to be compatible with the topology_graphs.

  • topology_graphs (list of TopologyGraph) – The topology graphs of .ConstructedMolecule being made.

  • num_processes (int, optional) – The number of parallel processes to create when constructing the molecules. If None, creates a process for each core on the computer.

  • duplicates (bool, optional) – If False, duplicate structures are removed from the population.

  • use_cache (bool, optional) – Toggles use of the molecular cache.

Returns

A Population holding .ConstructedMolecule instances.

Return type

Population

Examples

Construct all possible cage molecules from some precursors

import stk

amines = [
    stk.BuildingBlock('NCCCN', ['amine']),
    stk.BuildingBlock('NCCCCCN', ['amine']),
    stk.BuildingBlock('NCCOCCN', ['amine']),
]
aldehydes = [
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=C(C=O)COCC=O', ['aldehyde']),
]
# A total of 9 cages will be created.
cages = stk.Population.init_all(
    building_blocks=[amines, aldehydes],
    topology_graphs=[stk.cage.FourPlusSix()]
)

Use the constructed cages and a new bunch of building blocks to create all possible cage complexes.

encapsulants = [
    stk.BuildingBlock('[Br][Br]'),
    stk.BuildingBlock('[F][F]'),
]

# Every combination of cage and encapsulant.
complexes = stk.Population.init_all(
    building_blocks=[cages, encapsulants],
    topology_graphs=[stk.host_guest_complex.Complex()]
)
classmethod init_diverse(building_blocks, topology_graphs, size, random_seed=None, use_cache=False)

Construct a chemically diverse Population.

All constructed molecules are held in direct_members.

In order to construct a ConstructedMolecule, a random Molecule is selected from each sublist in building_blocks. Once the first construction is complete, the next Molecule selected from each sublist is the one with the most different Morgan fingerprint to the prior one. The third construction uses randomly selected Molecule objects again and so on. This is done until size ConstructedMolecule instances have been constructed.

Parameters
  • building_blocks (list of Molecule) –

    A list holding nested building blocks, for example

    bbs1 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    bbs2 = [
        stk.ConstructedMolecule(...),
        stk.BuildingBlock(...),
        ...
    ]
    bbs3 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    building_blocks = [bbs1, bbs2, bbs3]
    

    To construct a new ConstructedMolecule, a Molecule is picked from each of the sublists in building_blocks. The picked Molecule instances are then supplied to the ConstructedMolecule

    # mol is a new ConstructedMolecule. bb1 is selected
    # from bbs1, bb2 is selected from bbs2 and bb3 is
    # selected from bbs3.
    mol = stk.ConstructedMolecule(
        building_blocks=[bb1, bb2, bb3],
        topology_graph=topology_pick
    )
    

    The order a Molecule instance is given to ConstructedMolecule is determined by the sublist of building_blocks it was picked from. Note that the number of sublists in building_blocks is not fixed. It merely has to be compatible with the topology_graphs.

  • topology_graphs (iterable of TopologyGraph) – An iterable holding topology grpahs which should be randomly selected for the construction of a ConstructedMolecule.

  • size (int) – The desired size of the Population.

  • random_seed (int, optional) – Seed for the random number generator to get replicable results.

  • use_cache (bool, optional) – Toggles use of the molecular cache.

Returns

A population filled with the constructed molecules.

Return type

Population

Examples

Construct a diverse Population of cage molecules from some precursors

import stk

amines = [
    stk.BuildingBlock('NCCCN', ['amine']),
    stk.BuildingBlock('NCCCCCN', ['amine']),
    stk.BuildingBlock('NCCOCCN', ['amine']),
]
aldehydes = [
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=C(C=O)COCC=O', ['aldehyde']),
]
# A total of 4 cages will be created.
cages = stk.Population.init_diverse(
    building_blocks=[amines, aldehydes],
    topology_graphs=[stk.cage.FourPlusSix()],
    size=4
)

Use the constructed cages and a new bunch of building blocks to create some diverse cage complexes.

encapsulants = [
    stk.BuildingBlock('[Br][Br]'),
    stk.BuildingBlock('[F][F]'),
]

# 4 combinations of cage and encapsulant.
complexes = stk.Population.init_diverse(
    building_blocks=[cages, encapsulants],
    topology_graphs=[stk.host_guest_complex.Complex()],
    size=4
)
classmethod init_from_list(pop_list, use_cache=False)

Initialize a population from a list representation.

Parameters
  • pop_list (list) –

    A list which represents a Population. Like the ones created by to_list(). For example in,

    pop_list = [{...}, [{...}], [{...}, {...}], {...}]
    

    pop_list represents the Population, sublists represent its subpopulations and the dict {...} represents the members.

  • use_cache (bool, optional) – Toggles use of the molecular cache.

Returns

The population represented by pop_list.

Return type

Population

classmethod init_random(building_blocks, topology_graphs, size, random_seed=None, use_cache=False)

Construct molecules for a random Population.

All molecules are held in direct_members.

From the supplied building blocks a random Molecule is selected from each sublist to form a ConstructedMolecule. This is done until size ConstructedMolecule objects have been constructed.

Parameters
  • building_blocks (list of Molecule) –

    A list holding nested building blocks, for example

    bbs1 = [
        stk.BuildingBlock(...),
        sk.BuildingBlock(...),
        ...
    ]
    bbs2 = [
        stk.ConstructedMolecule(...),
        stk.BuildingBlock(...),
        ...
    ]
    bbs3 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    building_blocks = [bbs1, bbs2, bbs3]
    

    To construct a new ConstructedMolecule, a Molecule is picked from each of the sublists in building_blocks. The picked Molecule instances are then supplied to ConstructedMolecule

    # mol is a new ConstructedMolecule. bb1 is selected
    # from bbs1, bb2 is selected from bbs2 and bb3 is
    # selected from bbs3.
    mol = stk.ConstructedMolecule(
        building_blocks=[bb1, bb2, bb3],
        topology_graph=topology_pick
    )
    

    The order a Molecule instance is given to the ConstructedMolecule is determined by the sublist of building_blocks it was picked from. Note that the number of sublists in building_blocks is not fixed. It merely has to be compatible with the topology_graphs.

  • topology_graphs (iterable of TopologyGraph) – An iterable holding topology graphs which should be randomly selected during initialization of ConstructedMolecule.

  • size (int) – The size of the population to be initialized.

  • random_seed (int, optional) – Seed for the random number generator to get replicable results.

  • use_cache (bool, optional) – Toggles use of the molecular cache.

Returns

A population filled with random ConstructedMolecule instances.

Return type

Population

Examples

Construct 5 random cage molecules from some precursors

import stk

amines = [
    stk.BuildingBlock('NCCCN', ['amine']),
    stk.BuildingBlock('NCCCCCN', ['amine']),
    stk.BuildingBlock('NCCOCCN', ['amine']),
]
aldehydes = [
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=C(C=O)COCC=O', ['aldehyde']),
]
# A total of 5 cages will be created.
cages = stk.Population.init_random(
    building_blocks=[amines, aldehydes],
    topology_graphs=[stk.cage.FourPlusSix()],
    size=5
)

Use the constructed cages and a new bunch of building blocks to create some random cage complexes.

encapsulants = [
    stk.BuildingBlock('[Br][Br]'),
    stk.BuildingBlock('[F][F]'),
]

# Random combinations of cage and encapsulant.
complexes = stk.Population.init_random(
    building_blocks=[cages, encapsulants],
    topology_graphs=[stk.host_guest_complex.Complex()],
    size=5
)
classmethod load(path, use_cache=False)

Initialize a Population from one dumped to a file.

Parameters
  • path (str) – The full path of the file holding the dumped population.

  • use_cache (bool, optional) – Toggles use of the moleular cache.

Returns

The population stored in the dump file.

Return type

Population

open_process_pool(self, num_processes=None)

Open a process pool.

Parameters

num_processes (int, optional) – The number of processes in the pool. If None, then creates a process for each core on the computer.

Returns

The population.

Return type

Population

Raises

RuntimeError – If a process pool is already open.

optimize(self, optimizer, num_processes=None)

Optimize the structures of molecules in the population.

The molecules are optimized serially or in parallel depending if num_processes is 1 or more. The serial version may be faster in cases where all molecules have already been optimized and the optimizer will skip them. In this case creating a parallel process pool creates unnecessary overhead.

Parameters
  • optimizer (Optimizer) – The optimizer used to carry out the optimizations.

  • num_processes (int, optional) – The number of parallel processes to create. Optimization will run serially if 1. If None, creates a process for each core on the computer. This parameter will be ignored if the population has an open process pool.

Returns

None

Return type

NoneType

remove_duplicates(self, across_subpopulations=True, key=<built-in function id>)

Remove duplicates from the population.

The question of which molecule is preserved when duplicates are removed is difficult to answer. The iteration through a population is depth-first, so a rule such as “the molecule in the topmost population is preserved” is not the case here. Rather, the first molecule found is preserved.

However, this question is only relevant if duplicates in different subpopulations are being removed. In this case it is assumed that it is more important to have a single instance than to worry about which subpopulation it is in.

If the duplicates are being removed from within subpopulations, each subpopulation will end up with a single instance of all molecules held before. There is no “choice”.

Parameters
  • across_subpopulations (bool, optional) – When False duplicates are only removed from within a given subpopulation. If True, all duplicates are removed, regardless of which subpopulation they are in.

  • key (callable, optional) – Two molecules are considered the same if the values returned by key(molecule) are the same.

Returns

None

Return type

NoneType

remove_members(self, key)

Remove all members where key(member) is True.

Parameters

key (callable) – A callable which takes 1 argument. Each member of the population is passed as the argument to key in turn. If the result is True then the member is removed from the population.

Returns

None

Return type

NoneType

set_fitness_values_from_calculators(self, fitness_calculator, fitness_normalizer=None, num_processes=None)

Set the fitness values of molecules.

Parameters
  • fitness_calculator (FitnessCalculator) – Used to calculate the initial fitness values.

  • fitness_normalizer (FitnessNormalizer, optional) – Used to normalize the fitness values.

  • num_processes (int, optional) – The number of parallel processes to create. Calculations will run serially if 1. If None, creates a process for each core on the computer. This parameter will be ignored if the population has an open process pool.

Returns

The population is returned.

Return type

EAPopulation

set_fitness_values_from_dict(self, fitness_values)

Set the fitness values of molecules.

Parameters

fitness_values (dict) – Maps molecules in the population to their fitness values.

Returns

The population is returned.

Return type

EAPopulation

set_mol_ids(self, n, overwrite=False)

Give each member of the population an id starting from n.

This method adds an id attribute to each Molecule instance held by the population.

Parameters
  • n (int) – A number. Members of this Population are given a unique number as an id, starting from n and incremented by one between members.

  • overwrite (bool, optional) – If True, existing ids are replaced.

Returns

The value of the last id assigned, plus 1.

Return type

int

to_list(self, include_attrs=None, ignore_missing_attrs=False)

Convert the population to a list representation.

Parameters
  • include_attrs (list of str, optional) – The names of attributes to be added to the molecular representations. Each attribute is saved as a string using repr().

  • ignore_missing_attrs (bool, optional) – If False and an attribute in include_attrs is not held by a Molecule, an error will be raised.

Returns

A list representation of the Population.

Return type

list

write(self, path)

Write the .mol files of members to a directory.

Parameters

path (str) – The full path of the directory into which the .mol file is written.

Returns

None

Return type

NoneType

class Population(*args)

Bases: object

A container for Molecule objects.

Population instances can be nested.

In addition to holding Molecule objects, the Population class can be used to create large numbers of these instances through the class methods beginning with “init”.

Molecule instances held by a Population can have their structures optimized in parallel through the optimize() method.

It supports all expected and necessary container operations such as iteration, indexing and membership checks (via the is in operator).

direct_members

Held here are direct members of the Population. In other words, these are the molecules not held by any subpopulations. As a result, not all members of a Population are stored in this attribute.

Type

list of Molecule

subpopulations

A list holding the subpopulations.

Type

list of Population

Examples

A Population can be iterated through just like a list

import stk

# Create a population.
pop = stk.Population(
    stk.BuildingBlock(...),
    stk.ConstructedMolecule(...),
    stk.BuildingBlock(...),
    stk.BuildingBlock(...),

    stk.Population(
        stk.BuildingBlock(...),
        stk.ConstructedMolecule(...)
    )

    stk.ConstructedMolecule(...),
    stk.BuildingBlock(...)

)

for member in pop:
    do_stuff(member)

When iterating through a Population you will also iterate through nested members, that is members which are held by subpopulations. If you only wish to iterate through direct members, you can

for member in pop.direct_members:
    do_stuff(member)

You can also get access to members by using indices. Indices have access to all members in the population

first_member = pop[0]
second_member = pop[1]

Indices will first access direct members of the population and then access members in the subpopulations. Indices access nested members depth-first

pop2 = stk.Population(bb1, bb2, stk.Population(bb3, bb4))
# Get bb1.
pop2[0]
# Get bb2.
pop2[1]
# Get bb3.
pop2[2]
# Get bb4.
pop2[3]

You can get a subpopulation by taking a slice

# new_pop is a new Population instance and has no nesting.
new_pop = pop[2:4]

You can take the length of a population to get the total number of members

len(pop)

Adding populations creates a new population with both of the added populations as subpopulations

# added has no direct members and two subpopulations, pop and
# pop2.
added = pop + pop2

Subtracting populations creates a new, flat population.

# subbed has all objects in pop except those also found in
# pop2.
subbed = pop - pop2

You can check if an object is already present in the population.

bb1 = stk.BuildingBlock(...)
bb2 = stk.BuildingBlock(...)
pop3 = stk.Population(bb1)

# Returns True.
bb1 in pop3
# Returns False.
bb2 in pop3
# Returns True.
bb2 not in pop3

If you want to run multiple optimize() calls in a row, use the “with” statement. This keeps a single process pool open, and means you do not create a new one for each optimize() call. It also automatically closes the pool for you when the block exits

population = stk.Population(...)
# Keep a process pool open through the "with" statement.
with population.open_process_pool(8):
    # All optimize calls within this block will use the
    # same process pool.
    population.optimize(stk.UFF())
    population.add_members(...)
    population.optimize(stk.UFF())
# Process pool is automatically cleaned up when the block
# exits.

Methods

add_members(self, molecules[, duplicate_key])

Add Molecule instances to the Population.

add_subpopulation(self, population)

Add a clone of population to subpopulations.

clone(self)

Return a clone.

close_process_pool(self)

Close an open process pool.

dump(self, path[, include_attrs, …])

Dump the Population to a file.

init_all(building_blocks, topology_graphs[, …])

Make all possible molecules from groups of building blocks.

init_diverse(building_blocks, …[, …])

Construct a chemically diverse Population.

init_from_list(pop_list[, use_cache])

Initialize a population from a list representation.

init_random(building_blocks, …[, …])

Construct molecules for a random Population.

load(path[, use_cache])

Initialize a Population from one dumped to a file.

open_process_pool(self[, num_processes])

Open a process pool.

optimize(self, optimizer[, num_processes])

Optimize the structures of molecules in the population.

remove_duplicates(self[, …])

Remove duplicates from the population.

remove_members(self, key)

Remove all members where key(member) is True.

set_mol_ids(self, n[, overwrite])

Give each member of the population an id starting from n.

to_list(self[, include_attrs, …])

Convert the population to a list representation.

write(self, path)

Write the .mol files of members to a directory.

__init__(self, *args)

Initialize a Population.

Parameters

*args (Molecule, Population) – A population is initialized with the Molecule and Population instances it should hold.

Examples

bb1 = stk.BuildingBlock('CCC')
bb2 = stk.BuildingBlock('NCCNCNC')
bb3 = stk.BuildingBlock('[Br]CCC[Br]')
pop1 = stk.Population(bb1, bb2, bb3)

bb4 = stk.BuildingBlock('NNCCCN')
# pop2 has pop1 as a subpopulation and bb4 as a direct
# member.
pop2 = stk.Population(pop1, bb4)
add_members(self, molecules, duplicate_key=None)

Add Molecule instances to the Population.

The added Molecule instances are added as direct members of the population, they are not placed into any subpopulations.

Parameters
  • molecules (iterable of Molecule) – The molecules to be added as direct members.

  • duplicate_key (callable, optional) – If not None, duplicate_key(mol) is evalued on each molecule in members. If a molecule with the same duplicate_key is already present in the population, the molecule is not added.

Returns

None

Return type

NoneType

add_subpopulation(self, population)

Add a clone of population to subpopulations.

Only a clone of the population container is made. The molecules it holds are not copies.

Parameters

population (Population) – The population to be added as a subpopulation.

Returns

None

Return type

NoneType

clone(self)

Return a clone.

The clone will share the Molecule objects, copies of Molecule objects will not be made.

Returns

The clone.

Return type

Population

Examples

import stk

# Make an intial population.
pop = stk.Population(stk.BuildingBlock('NCCN'))
# Make a clone.
clone = pop.clone()
close_process_pool(self)

Close an open process pool.

Returns

The population.

Return type

Population

dump(self, path, include_attrs=None, ignore_missing_attrs=False)

Dump the Population to a file.

Parameters
  • path (str) – The full path of the file to which the Population should be dumped.

  • include_attrs (list of str, optional) – The names of attributes of the molecules to be added to the JSON. Each attribute is saved as a string using repr().

  • ignore_missing_attrs (bool, optional) – If False and an attribute in include_attrs is not held by a Molecule, an error will be raised.

Returns

None

Return type

NoneType

classmethod init_all(building_blocks, topology_graphs, num_processes=None, duplicates=False, use_cache=False)

Make all possible molecules from groups of building blocks.

Parameters
  • building_blocks (list of Molecule) –

    A list holding nested building blocks, for example

    bbs1 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    bbs2 = [
        stk.ConstructedMolecule(...),
        stk.BuildingBlock(...),
        ...,
    ]
    bbs3 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    building_blocks = [bbs1, bbs2, bbs3]
    

    To construct a new ConstructedMolecule, a Molecule is picked from each of the sublists in building_blocks. The picked Molecule instances are then supplied to ConstructedMolecule

    # mol is a new ConstructedMolecule. bb1 is selected
    # from bbs1, bb2 is selected from bbs2 and bb3 is
    # selected from bbs3.
    mol = stk.ConstructedMolecule(
        building_blocks=[bb1, bb2, bb3],
        topology_graph=topology_pick
    )
    

    The order a Molecule instance is given to the ConstructedMolecule is determined by the sublist of building_blocks it was picked from. Note that the number of sublists in building_blocks is not fixed. It merely has to be compatible with the topology_graphs.

  • topology_graphs (list of TopologyGraph) – The topology graphs of .ConstructedMolecule being made.

  • num_processes (int, optional) – The number of parallel processes to create when constructing the molecules. If None, creates a process for each core on the computer.

  • duplicates (bool, optional) – If False, duplicate structures are removed from the population.

  • use_cache (bool, optional) – Toggles use of the molecular cache.

Returns

A Population holding .ConstructedMolecule instances.

Return type

Population

Examples

Construct all possible cage molecules from some precursors

import stk

amines = [
    stk.BuildingBlock('NCCCN', ['amine']),
    stk.BuildingBlock('NCCCCCN', ['amine']),
    stk.BuildingBlock('NCCOCCN', ['amine']),
]
aldehydes = [
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=C(C=O)COCC=O', ['aldehyde']),
]
# A total of 9 cages will be created.
cages = stk.Population.init_all(
    building_blocks=[amines, aldehydes],
    topology_graphs=[stk.cage.FourPlusSix()]
)

Use the constructed cages and a new bunch of building blocks to create all possible cage complexes.

encapsulants = [
    stk.BuildingBlock('[Br][Br]'),
    stk.BuildingBlock('[F][F]'),
]

# Every combination of cage and encapsulant.
complexes = stk.Population.init_all(
    building_blocks=[cages, encapsulants],
    topology_graphs=[stk.host_guest_complex.Complex()]
)
classmethod init_diverse(building_blocks, topology_graphs, size, random_seed=None, use_cache=False)

Construct a chemically diverse Population.

All constructed molecules are held in direct_members.

In order to construct a ConstructedMolecule, a random Molecule is selected from each sublist in building_blocks. Once the first construction is complete, the next Molecule selected from each sublist is the one with the most different Morgan fingerprint to the prior one. The third construction uses randomly selected Molecule objects again and so on. This is done until size ConstructedMolecule instances have been constructed.

Parameters
  • building_blocks (list of Molecule) –

    A list holding nested building blocks, for example

    bbs1 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    bbs2 = [
        stk.ConstructedMolecule(...),
        stk.BuildingBlock(...),
        ...
    ]
    bbs3 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    building_blocks = [bbs1, bbs2, bbs3]
    

    To construct a new ConstructedMolecule, a Molecule is picked from each of the sublists in building_blocks. The picked Molecule instances are then supplied to the ConstructedMolecule

    # mol is a new ConstructedMolecule. bb1 is selected
    # from bbs1, bb2 is selected from bbs2 and bb3 is
    # selected from bbs3.
    mol = stk.ConstructedMolecule(
        building_blocks=[bb1, bb2, bb3],
        topology_graph=topology_pick
    )
    

    The order a Molecule instance is given to ConstructedMolecule is determined by the sublist of building_blocks it was picked from. Note that the number of sublists in building_blocks is not fixed. It merely has to be compatible with the topology_graphs.

  • topology_graphs (iterable of TopologyGraph) – An iterable holding topology grpahs which should be randomly selected for the construction of a ConstructedMolecule.

  • size (int) – The desired size of the Population.

  • random_seed (int, optional) – Seed for the random number generator to get replicable results.

  • use_cache (bool, optional) – Toggles use of the molecular cache.

Returns

A population filled with the constructed molecules.

Return type

Population

Examples

Construct a diverse Population of cage molecules from some precursors

import stk

amines = [
    stk.BuildingBlock('NCCCN', ['amine']),
    stk.BuildingBlock('NCCCCCN', ['amine']),
    stk.BuildingBlock('NCCOCCN', ['amine']),
]
aldehydes = [
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=C(C=O)COCC=O', ['aldehyde']),
]
# A total of 4 cages will be created.
cages = stk.Population.init_diverse(
    building_blocks=[amines, aldehydes],
    topology_graphs=[stk.cage.FourPlusSix()],
    size=4
)

Use the constructed cages and a new bunch of building blocks to create some diverse cage complexes.

encapsulants = [
    stk.BuildingBlock('[Br][Br]'),
    stk.BuildingBlock('[F][F]'),
]

# 4 combinations of cage and encapsulant.
complexes = stk.Population.init_diverse(
    building_blocks=[cages, encapsulants],
    topology_graphs=[stk.host_guest_complex.Complex()],
    size=4
)
classmethod init_from_list(pop_list, use_cache=False)

Initialize a population from a list representation.

Parameters
  • pop_list (list) –

    A list which represents a Population. Like the ones created by to_list(). For example in,

    pop_list = [{...}, [{...}], [{...}, {...}], {...}]
    

    pop_list represents the Population, sublists represent its subpopulations and the dict {...} represents the members.

  • use_cache (bool, optional) – Toggles use of the molecular cache.

Returns

The population represented by pop_list.

Return type

Population

classmethod init_random(building_blocks, topology_graphs, size, random_seed=None, use_cache=False)

Construct molecules for a random Population.

All molecules are held in direct_members.

From the supplied building blocks a random Molecule is selected from each sublist to form a ConstructedMolecule. This is done until size ConstructedMolecule objects have been constructed.

Parameters
  • building_blocks (list of Molecule) –

    A list holding nested building blocks, for example

    bbs1 = [
        stk.BuildingBlock(...),
        sk.BuildingBlock(...),
        ...
    ]
    bbs2 = [
        stk.ConstructedMolecule(...),
        stk.BuildingBlock(...),
        ...
    ]
    bbs3 = [
        stk.BuildingBlock(...),
        stk.BuildingBlock(...),
        ...
    ]
    building_blocks = [bbs1, bbs2, bbs3]
    

    To construct a new ConstructedMolecule, a Molecule is picked from each of the sublists in building_blocks. The picked Molecule instances are then supplied to ConstructedMolecule

    # mol is a new ConstructedMolecule. bb1 is selected
    # from bbs1, bb2 is selected from bbs2 and bb3 is
    # selected from bbs3.
    mol = stk.ConstructedMolecule(
        building_blocks=[bb1, bb2, bb3],
        topology_graph=topology_pick
    )
    

    The order a Molecule instance is given to the ConstructedMolecule is determined by the sublist of building_blocks it was picked from. Note that the number of sublists in building_blocks is not fixed. It merely has to be compatible with the topology_graphs.

  • topology_graphs (iterable of TopologyGraph) – An iterable holding topology graphs which should be randomly selected during initialization of ConstructedMolecule.

  • size (int) – The size of the population to be initialized.

  • random_seed (int, optional) – Seed for the random number generator to get replicable results.

  • use_cache (bool, optional) – Toggles use of the molecular cache.

Returns

A population filled with random ConstructedMolecule instances.

Return type

Population

Examples

Construct 5 random cage molecules from some precursors

import stk

amines = [
    stk.BuildingBlock('NCCCN', ['amine']),
    stk.BuildingBlock('NCCCCCN', ['amine']),
    stk.BuildingBlock('NCCOCCN', ['amine']),
]
aldehydes = [
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=CCC(C=O)CC=O', ['aldehyde']),
    stk.BuildingBlock('O=C(C=O)COCC=O', ['aldehyde']),
]
# A total of 5 cages will be created.
cages = stk.Population.init_random(
    building_blocks=[amines, aldehydes],
    topology_graphs=[stk.cage.FourPlusSix()],
    size=5
)

Use the constructed cages and a new bunch of building blocks to create some random cage complexes.

encapsulants = [
    stk.BuildingBlock('[Br][Br]'),
    stk.BuildingBlock('[F][F]'),
]

# Random combinations of cage and encapsulant.
complexes = stk.Population.init_random(
    building_blocks=[cages, encapsulants],
    topology_graphs=[stk.host_guest_complex.Complex()],
    size=5
)
classmethod load(path, use_cache=False)

Initialize a Population from one dumped to a file.

Parameters
  • path (str) – The full path of the file holding the dumped population.

  • use_cache (bool, optional) – Toggles use of the moleular cache.

Returns

The population stored in the dump file.

Return type

Population

open_process_pool(self, num_processes=None)

Open a process pool.

Parameters

num_processes (int, optional) – The number of processes in the pool. If None, then creates a process for each core on the computer.

Returns

The population.

Return type

Population

Raises

RuntimeError – If a process pool is already open.

optimize(self, optimizer, num_processes=None)

Optimize the structures of molecules in the population.

The molecules are optimized serially or in parallel depending if num_processes is 1 or more. The serial version may be faster in cases where all molecules have already been optimized and the optimizer will skip them. In this case creating a parallel process pool creates unnecessary overhead.

Parameters
  • optimizer (Optimizer) – The optimizer used to carry out the optimizations.

  • num_processes (int, optional) – The number of parallel processes to create. Optimization will run serially if 1. If None, creates a process for each core on the computer. This parameter will be ignored if the population has an open process pool.

Returns

None

Return type

NoneType

remove_duplicates(self, across_subpopulations=True, key=<built-in function id>)

Remove duplicates from the population.

The question of which molecule is preserved when duplicates are removed is difficult to answer. The iteration through a population is depth-first, so a rule such as “the molecule in the topmost population is preserved” is not the case here. Rather, the first molecule found is preserved.

However, this question is only relevant if duplicates in different subpopulations are being removed. In this case it is assumed that it is more important to have a single instance than to worry about which subpopulation it is in.

If the duplicates are being removed from within subpopulations, each subpopulation will end up with a single instance of all molecules held before. There is no “choice”.

Parameters
  • across_subpopulations (bool, optional) – When False duplicates are only removed from within a given subpopulation. If True, all duplicates are removed, regardless of which subpopulation they are in.

  • key (callable, optional) – Two molecules are considered the same if the values returned by key(molecule) are the same.

Returns

None

Return type

NoneType

remove_members(self, key)

Remove all members where key(member) is True.

Parameters

key (callable) – A callable which takes 1 argument. Each member of the population is passed as the argument to key in turn. If the result is True then the member is removed from the population.

Returns

None

Return type

NoneType

set_mol_ids(self, n, overwrite=False)

Give each member of the population an id starting from n.

This method adds an id attribute to each Molecule instance held by the population.

Parameters
  • n (int) – A number. Members of this Population are given a unique number as an id, starting from n and incremented by one between members.

  • overwrite (bool, optional) – If True, existing ids are replaced.

Returns

The value of the last id assigned, plus 1.

Return type

int

to_list(self, include_attrs=None, ignore_missing_attrs=False)

Convert the population to a list representation.

Parameters
  • include_attrs (list of str, optional) – The names of attributes to be added to the molecular representations. Each attribute is saved as a string using repr().

  • ignore_missing_attrs (bool, optional) – If False and an attribute in include_attrs is not held by a Molecule, an error will be raised.

Returns

A list representation of the Population.

Return type

list

write(self, path)

Write the .mol files of members to a directory.

Parameters

path (str) – The full path of the directory into which the .mol file is written.

Returns

None

Return type

NoneType