Selection

  1. Best

  2. Worst

  3. Roulette

  4. AboveAverage

  5. Tournament

  6. StochasticUniversalSampling

  7. RemoveBatches

  8. RemoveMolecules

  9. FilterBatches

  10. FilterMolecules

  11. If

  12. TryCatch

  13. Sequence

  14. Random

  15. RaisingCalculator

Selection is carried out by Selector objects. Selectors are objects with a select() method, which is used to select batches of molecules from an EAPopulation. Examples of how Selector classes can be used is given their documentation, for example Best, :class:.`Roulette` or AboveAverage.

Selectors can be combined to generate more complex selection processes. For example, let’s say we want to implement elitism. Elitism is when the best batches are guaranteed to be selected first, before the selection algorithm is carried out. The Sequence exists precisely for this reason. It takes two selectors and yields batches from them, one after the other

import stk

population = stk.EAPopulation(...)
elite_roulette = stk.Sequence(
    stk.Best(5),
    stk.Roulette(20),
)
# Select with Best first and then with Roulette.
for batch in elite_roulette.select(population):
    # Do stuff with batch.

What if you did not want Roulette to yield any batches selected by Best? The RemoveBatches selector can be used for this. It takes two selectors, one called a remover and one called a selector. It first yields batches of molecules from a population with the remover. It then passes the same population to the selector but prevents it from yielding any batches selected by the remover

roulette_without_elites = stk.RemoveBatches(
    remover=stk.Best(5),
    selector=stk.Roulette(20),
)
# Select batches, excluding the top 5.
for batch in roulette_without_elites.select(population):
    # Do stuff with batch.

You can combine RemoveBatches and Sequence to get a selector which yields the top 5 batches first and then, using roulette, selects any of the other batches

elite_roulette2 = stk.Sequence(
    stk.Best(5),
    roulette_without_elites,
)

The same thing can be written more explicitly

elite_roulette2 = stk.SelectorSequence(
    stk.Best(5),
    stk.RemoveBatches(
        remover=stk.Best(5),
        selector=stk.Roulette(20),
    ),
)

You can also explore other combinations with FilterBatches, FilterMolecules and RemoveMolecules. Examples using these classes are given in their docstrings.

Making New Selectors

When a new Selector class is made it must inherit Selector and implement any virtual methods.

class AboveAverage(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)

Bases: stk.calculators.ea.selectors._BatchingSelector, stk.calculators.ea.selectors.Selector

Yields above average batches of molecules.

Examples

Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation

import stk

# Make a population holding some molecules.
pop = stk.Population(...)

# Make the selector.
above_avg = stk.AboveAverage()

# Select the molecules.
for selected, in above_avg.select(pop):
    # Do stuff with each selected molecule, like apply a
    # mutation to it to generate a mutant.
    mutant = mutator.mutate(selected)

Yielding multiple molecules at once. For example, if molecules need to be selected for crossover.

import stk

# Make a population holding some molecules.
pop = stk.Population(...)

# Make the selector.
above_avg = stk.AboveAverage(batch_size=2)

# Select the molecules.
for selected in above_avg.select(pop):
    # selected is a tuple of length 2, holding the selected
    # molecules. You can do stuff with the selected molecules
    # Like apply crossover operations on them.
    offspring = list(crosser.cross(*selected))

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)

Initialize an AboveAverage instance.

Parameters
  • num_batches (int, optional) – The number of batches to yield. If None then yielding will continue forever or until the generator is exhausted, whichever comes first.

  • batch_size (int, optional) – The number of molecules yielded at once.

  • duplicate_mols (bool, optional) – If True the same molecule can be yielded in more than one batch.

  • duplicate_batches (bool, optional) – If True the same batch can be yielded more than once.

  • fitness_modifier (callable, optional) – Takes the population on which select() is called and returns a dict mapping molecules in the population to the fitness values the Selector should use. If None then EAPopulation.get_fitness_values() is used.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class Batch(mols, fitness_values)

Bases: object

Represents a batch of molecules.

Batches can be compared, the comparison is based on their fitness values. Batches can also be iterated through, this iterates through all the molecules in the batch.

Examples

Sorting batches causes them to be sorted by fitness value.

batches = (Batch(...), Batch(...), Batch(...))
sorted_batches = sorted(batches)

Comparison is also based on fitness value

batch1 = Batch(...)
batch2 = Batch(...)
if batch1 > batch2:
    print('batch1 has a larger fitness value than batch2.')

Batches can be iterated through to get the molecules in the batch

batch = Batch(...)
for mol in batch:
    # Do stuff with mol.

Methods

get_fitness(self)

Get the fitness value of the batch.

get_identity_key(self)

Get the identity key of the batch.

get_size(self)

Get the number of molecules in the batch.

__init__(self, mols, fitness_values)

Initialize a Batch.

Parameters
  • mols (tuple of Molecule) – The molecules which are part of the batch.

  • fitness_values (dict) – Maps each molecule in mols to its fitness value.

get_fitness(self)

Get the fitness value of the batch.

Returns

The fitness value.

Return type

float

get_identity_key(self)

Get the identity key of the batch.

If two batches hold the same molecules, the same number of times, they will have the same identity key.

Returns

A hashable object which can be used to compare if two batches are the same.

Return type

object

get_size(self)

Get the number of molecules in the batch.

Returns

The number of molecules in the batch.

Return type

int

class Best(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)

Bases: stk.calculators.ea.selectors._BatchingSelector, stk.calculators.ea.selectors.Selector

Selects batches of molecules, highest fitness value first.

Examples

Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.

import stk

# Make a population holding some molecules.
pop = stk.Population(...)

# Make the selector.
best = stk.Best()

# Select the molecules.
for selected, in best.select(pop):
    # Do stuff with each selected molecule, like apply a
    # mutation to it to generate a mutant.
    mutant = mutator.mutate(selected)

Yielding multiple molecules at once. For example, if molecules need to be selected for crossover.

import stk

# Make a population holding some molecules.
pop = stk.Population(...)

# Make the selector.
best = stk.Best(batch_size=2)

# Select the molecules.
for selected in best.select(pop):
    # selected is a tuple of length 2, holding the selected
    # molecules. You can do stuff with the selected molecules
    # Like apply crossover operations on them.
    offspring = list(crosser.cross(*selected))

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)

Initialize a Best instance.

Parameters
  • num_batches (int, optional) – The number of batches to yield. If None then yielding will continue forever or until the generator is exhausted, whichever comes first.

  • batch_size (int, optional) – The number of molecules yielded at once.

  • duplicate_mols (bool, optional) – If True the same molecule can be yielded in more than one batch.

  • duplicate_batches (bool, optional) – If True the same batch can be yielded more than once. Duplicate batches can occur if the same molecule is found multiple times in a population.

  • fitness_modifier (callable, optional) – Takes the population on which select() is called and returns a dict mapping molecules in the population to the fitness values the Selector should use. If None then EAPopulation.get_fitness_values() is used.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class FilterBatches(filter, selector)

Bases: stk.calculators.ea.selectors.Selector

Allows a Selector to select only some batches.

Examples

You only want the Best 10 batches to participate in Roulette

import stk

population = stk.Population(...)
selector = stk.FilterBatches(
    filter=stk.Best(10),
    selector=stk.Roulette(7),
)
for batch in selector.select(population):
    # Do stuff with batch. It is one of the 10 best batches and
    # was selected using roulette selection.

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, filter, selector)

Initialize a FilterBatches instance.

Parameters
  • filter (Selector) – Selects batches which can be yielded by selector.

  • selector (Selector) – Selects batches, but only if they were also selected by filter.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class FilterMolecules(filter, selector)

Bases: stk.calculators.ea.selectors.Selector

Allows a Selector to select only some molecules.

Examples

You want to use Roulette on the molecules which belong to the Best 5 batches of size 3

import stk

population = stk.Population(...)
selector = stk.FilterMolecules(
    filter=stk.Best(num_batches=5, batch_size=3),
    selector=stk.Roulette(num_batches=20, batch_size=3),
)
for batch in selector.select(population):
    # Do stuff with batch. All the molecules in the batch
    # belong to the top 5 batches of size 3. The batch
    # was selected using roulette selection.

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, filter, selector)

Initialize a FilterMolecules instance.

Parameters
  • filter (Selector) – Selects molecules which can be yielded by selector.

  • selector (Selector) – Selects batches of molecules. The batches can only contain molecules yielded by filter.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class RemoveBatches(remover, selector)

Bases: stk.calculators.ea.selectors.Selector

Prevents a Selector from selecting some batches.

Examples

You want to use Roulette selection on all but the 5 Worst batches

import stk

population = stk.Population(...)
selector = stk.RemoveBatches(
    remover=stk.Worst(5),
    selector=stk.Roulette(20),
)
for batch in selector.select(population):
    # Do stuff with batch. It was selected with roulette
    # selection and is not one of the worst 5 batches.

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, remover, selector)

Initialize a RemoveBatches instance.

Parameters
  • remover (Selector) – Selects batches of molecules, which cannot be yielded by selector.

  • selector (Selector) – Selects batches of molecules, except those selected by remover.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class RemoveMolecules(remover, selector)

Bases: stk.calculators.ea.selectors.Selector

Prevents a Selector from selecting some molecules.

Examples

You want to prevent any of the molecules in the Best 5 batches from being selected by Roulette.

import stk

population = stk.Population(...)
selector = stk.RemoveMolecules(
    remover=stk.Best(num_batches=5, batch_size=3),
    selector=stk.Roulette(num_batches=20, batch_size=3),
)

for batch in selector.select(population):
    # Do stuff with batch. The batch is guaranteed not to
    # contain any molecules which are found in the best 5
    # batches of size 3.

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, remover, selector)

Initialize a RemoveMolecules instance.

Parameters
  • remover (Selector) – Selects batches molecules, any molecule selected cannot be selected by selector.

  • selector (Selector) – Selects batches of molecules, not containing any molecules selected by remover.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class Roulette(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)

Bases: stk.calculators.ea.selectors._BatchingSelector, stk.calculators.ea.selectors.Selector

Uses roulette selection to select batches of molecules.

In roulette selection the probability a batch is selected is given by its fitness. If the total fitness is the sum of all fitness values, the chance a batch is selected is given by:

p = batch fitness / total fitness,

where p is the probability of selection and the batch fitness is the sum of all fitness values of molecules in the batch 1.

References

1

http://tinyurl.com/csc3djm

Examples

Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation

import stk

# Make a population holding some molecules.
pop = stk.Population(...)

# Make the selector.
roulette = stk.Roulette()

# Select the molecules.
for selected, in roulette.select(pop):
    # Do stuff with each selected molecule, like apply a
    # mutation to it to generate a mutant.
    mutant = mutator.mutate(selected)

Yielding multiple molecules at once. For example, if molecules need to be selected for crossover

# Make a population holding some molecules.
pop = stk.Population(...)

# Make the selector.
roulette = stk.Roulette(batch_size=2)

# Select the molecules.
for selected in roulette.select(pop):
    # selected is a tuple of length 2, holding the selected
    # molecules. You can do stuff with the selected molecules
    # Like apply crossover operations on them.
    offspring = list(crosser.cross(*selected))

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)

Initialize a Roulette instance.

Parameters
  • num_batches (int, optional) – The number of batches to yield. If None then yielding will continue forever or until the generator is exhausted, whichever comes first.

  • batch_size (int, optional) – The number of molecules yielded at once.

  • duplicate_mols (bool, optional) – If True the same molecule can be yielded in more than one batch.

  • duplicate_batches (bool, optional) – If True the same batch can be yielded more than once.

  • fitness_modifier (callable, optional) – Takes the population on which select() is called and returns a dict mapping molecules in the population to the fitness values the Selector should use. If None then EAPopulation.get_fitness_values() is used.

  • random_seed (int, optional) – The random seed to use.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class Selector

Bases: stk.calculators.base_calculators.Calculator

An abstract base class for selectors.

Selectors select batches of molecules from a population. Each batch is selected based on its fitness. The fitness of a batch is the sum of all fitness values of the molecules in the batch. Batches may be of size 1.

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, /, *args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class StochasticUniversalSampling(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)

Bases: stk.calculators.ea.selectors._BatchingSelector, stk.calculators.ea.selectors.Selector

Yields batches of molecules through stochastic universal sampling.

Stochastic universal sampling lays out batches along a line, with each batch taking up length proportional to its fitness. It then creates a set of evenly spaced pointers to different points on the line, each of which is occupied by a batch. Batches which are pointed to are yielded.

This approach means weaker members of the population are given a greater chance to be chosen than in Roulette selection 2.

References

2

https://en.wikipedia.org/wiki/Stochastic_universal_sampling

Examples

Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.

import stk

# Make a population holding some molecules.
pop = stk.Population(...)

# Make the selector.
stochastic_sampling = stk.StochasticUniversalSampling(5)

# Select the molecules.
for selected, in stochastic_sampling.select(pop):
    # Do stuff with each selected molecule, like apply a
    # mutation to it to generate a mutant.
    mutant = mutator.mutate(selected)

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)

Initialize a StochasticUniversalSampling instance.

Parameters
  • num_batches (int, optional) – The number of batches to yield. If None then yielding will continue forever or until the generator is exhausted, whichever comes first.

  • batch_size (int, optional) – The number of molecules yielded at once.

  • duplicate_mols (bool, optional) – If True the same molecule can be yielded in more than one batch.

  • duplicate_batches (bool, optional) – If True the same batch can be yielded more than once.

  • fitness_modifier (callable, optional) – Takes the population on which select() is called and returns a dict mapping molecules in the population to the fitness values the Selector should use. If None then EAPopulation.get_fitness_values() is used.

  • random_seed (int, optional) – The random seed to use.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class Tournament(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)

Bases: stk.calculators.ea.selectors._BatchingSelector, stk.calculators.ea.selectors.Selector

Yields batches of molecules through tournament selection.

In tournament selection, a random number of batches is chosen from the population undergo a competition. In each competition, the batch with the highest fitness value is yielded. This is repeated until num_batches are yielded.

Examples

Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.

import stk

# Make a population holding some molecules.
pop = stk.Population(...)

# Make the selector.
tournament = stk.Tournament(
    num_batches=5,
    batch_size=1
)

# Select the molecules.
for selected, in tournament.select(pop):
    # Do stuff with each selected molecule, like apply a
    # mutation to it to generate a mutant.
    mutant = mutator.mutate(selected)

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)

Initialize a Tournament instance.

Parameters
  • num_batches (int, optional) – The number of batches to yield. If None then yielding will continue forever or until the generator is exhausted, whichever comes first.

  • batch_size (int, optional) – The number of molecules yielded at once.

  • duplicate_mols (bool, optional) – If True the same molecule can be yielded in more than one batch.

  • duplicate_batches (bool, optional) – If True the same batch can be yielded more than once.

  • fitness_modifier (callable, optional) – Takes the population on which select() is called and returns a dict mapping molecules in the population to the fitness values the Selector should use. If None then EAPopulation.get_fitness_values() is used.

  • random_seed (int, optional) – The random seed to use.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.

class Worst(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)

Bases: stk.calculators.ea.selectors._BatchingSelector, stk.calculators.ea.selectors.Selector

Selects batches of molecules, lowest fitness value first.

Examples

Select the worst 5 batches of size 3

import stk

population = stk.Population(...)
worst = stk.Worst(num_batches=5, batch_size=3)
for batch in worst.select(population):
    # Do stuff with batch.

Methods

select(self, population[, included_batches, …])

Select batches of molecules from population.

__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)

Initialize a Worst instance.

Parameters
  • num_batches (int, optional) – The number of batches to yield. If None then yielding will continue forever or until the generator is exhausted, whichever comes first.

  • batch_size (int, optional) – The number of molecules yielded at once.

  • duplicate_mols (bool, optional) – If True the same molecule can be yielded in more than one batch.

  • duplicate_batches (bool, optional) – If True the same batch can be yielded more than once. Duplicate batches can occur if the same molecule is found multiple times in a population.

  • fitness_modifier (callable, optional) – Takes the population on which select() is called and returns a dict mapping molecules in the population to the fitness values the Selector should use. If None then EAPopulation.get_fitness_values() is used.

select(self, population, included_batches=None, excluded_batches=None)

Select batches of molecules from population.

Parameters
  • population (EAPopulation) – A collection of molecules from which batches are selected.

  • included_batches (set, optional) – The identity keys of batches which are allowed to be yielded, if None all batches can be yielded. If not None only batches included_batches will be yielded.

  • excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If None, no batch is forbidden from being yielded.

Yields

Batch of Molecule – A batch of selected molecules.