Selection¶
Selection is carried out by Selector objects.
Selectors are objects with a select()
method, which is used to select batches of molecules from an
EAPopulation. Examples of how Selector classes can
be used is given their documentation, for example Best,
:class:.`Roulette` or AboveAverage.
Selectors can be combined to generate more complex selection
processes. For example, let’s say we want to implement elitism.
Elitism is when the best batches are guaranteed to be selected first,
before the selection algorithm is carried out. The
Sequence exists precisely for this reason. It takes
two selectors and yields batches from them, one after the other
import stk
population = stk.EAPopulation(...)
elite_roulette = stk.Sequence(
stk.Best(5),
stk.Roulette(20),
)
# Select with Best first and then with Roulette.
for batch in elite_roulette.select(population):
# Do stuff with batch.
What if you did not want Roulette to yield any batches
selected by Best? The RemoveBatches selector can be
used for this. It takes two selectors, one called a remover and one
called a selector. It first yields batches of molecules from a
population with the remover. It then passes the same population
to the selector but prevents it from yielding any batches
selected by the remover
roulette_without_elites = stk.RemoveBatches(
remover=stk.Best(5),
selector=stk.Roulette(20),
)
# Select batches, excluding the top 5.
for batch in roulette_without_elites.select(population):
# Do stuff with batch.
You can combine RemoveBatches and Sequence
to get a selector which yields the top 5 batches first and then,
using roulette, selects any of the other batches
elite_roulette2 = stk.Sequence(
stk.Best(5),
roulette_without_elites,
)
The same thing can be written more explicitly
elite_roulette2 = stk.SelectorSequence(
stk.Best(5),
stk.RemoveBatches(
remover=stk.Best(5),
selector=stk.Roulette(20),
),
)
You can also explore other combinations with FilterBatches,
FilterMolecules and RemoveMolecules. Examples
using these classes are given in their docstrings.
Making New Selectors¶
When a new Selector class is made it must inherit
Selector and implement any virtual methods.
-
class
AboveAverage(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector,stk.calculators.ea.selectors.SelectorYields above average batches of molecules.
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. above_avg = stk.AboveAverage() # Select the molecules. for selected, in above_avg.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Yielding multiple molecules at once. For example, if molecules need to be selected for crossover.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. above_avg = stk.AboveAverage(batch_size=2) # Select the molecules. for selected in above_avg.select(pop): # selected is a tuple of length 2, holding the selected # molecules. You can do stuff with the selected molecules # Like apply crossover operations on them. offspring = list(crosser.cross(*selected))
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Initialize an
AboveAverageinstance.- Parameters
num_batches (
int, optional) – The number of batches to yield. IfNonethen yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int, optional) – The number of molecules yielded at once.duplicate_mols (
bool, optional) – IfTruethe same molecule can be yielded in more than one batch.duplicate_batches (
bool, optional) – IfTruethe same batch can be yielded more than once.fitness_modifier (
callable, optional) – Takes the population on whichselect()is called and returns adictmapping molecules in the population to the fitness values theSelectorshould use. IfNonethenEAPopulation.get_fitness_values()is used.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
Batch(mols, fitness_values)¶ Bases:
objectRepresents a batch of molecules.
Batches can be compared, the comparison is based on their fitness values. Batches can also be iterated through, this iterates through all the molecules in the batch.
Examples
Sorting batches causes them to be sorted by fitness value.
batches = (Batch(...), Batch(...), Batch(...)) sorted_batches = sorted(batches)
Comparison is also based on fitness value
batch1 = Batch(...) batch2 = Batch(...) if batch1 > batch2: print('batch1 has a larger fitness value than batch2.')
Batches can be iterated through to get the molecules in the batch
batch = Batch(...) for mol in batch: # Do stuff with mol.
Methods
get_fitness(self)Get the fitness value of the batch.
get_identity_key(self)Get the identity key of the batch.
get_size(self)Get the number of molecules in the batch.
-
__init__(self, mols, fitness_values)¶ Initialize a
Batch.- Parameters
mols (
tupleofMolecule) – The molecules which are part of the batch.fitness_values (
dict) – Maps each molecule in mols to its fitness value.
-
get_fitness(self)¶ Get the fitness value of the batch.
- Returns
The fitness value.
- Return type
float
-
get_identity_key(self)¶ Get the identity key of the batch.
If two batches hold the same molecules, the same number of times, they will have the same identity key.
- Returns
A hashable object which can be used to compare if two batches are the same.
- Return type
object
-
get_size(self)¶ Get the number of molecules in the batch.
- Returns
The number of molecules in the batch.
- Return type
int
-
-
class
Best(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector,stk.calculators.ea.selectors.SelectorSelects batches of molecules, highest fitness value first.
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. best = stk.Best() # Select the molecules. for selected, in best.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Yielding multiple molecules at once. For example, if molecules need to be selected for crossover.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. best = stk.Best(batch_size=2) # Select the molecules. for selected in best.select(pop): # selected is a tuple of length 2, holding the selected # molecules. You can do stuff with the selected molecules # Like apply crossover operations on them. offspring = list(crosser.cross(*selected))
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Initialize a
Bestinstance.- Parameters
num_batches (
int, optional) – The number of batches to yield. IfNonethen yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int, optional) – The number of molecules yielded at once.duplicate_mols (
bool, optional) – IfTruethe same molecule can be yielded in more than one batch.duplicate_batches (
bool, optional) – IfTruethe same batch can be yielded more than once. Duplicate batches can occur if the same molecule is found multiple times in a population.fitness_modifier (
callable, optional) – Takes the population on whichselect()is called and returns adictmapping molecules in the population to the fitness values theSelectorshould use. IfNonethenEAPopulation.get_fitness_values()is used.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
FilterBatches(filter, selector)¶ Bases:
stk.calculators.ea.selectors.SelectorAllows a
Selectorto select only some batches.Examples
You only want the
Best10 batches to participate inRouletteimport stk population = stk.Population(...) selector = stk.FilterBatches( filter=stk.Best(10), selector=stk.Roulette(7), ) for batch in selector.select(population): # Do stuff with batch. It is one of the 10 best batches and # was selected using roulette selection.
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, filter, selector)¶ Initialize a
FilterBatchesinstance.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
FilterMolecules(filter, selector)¶ Bases:
stk.calculators.ea.selectors.SelectorAllows a
Selectorto select only some molecules.Examples
You want to use
Rouletteon the molecules which belong to theBest5 batches of size 3import stk population = stk.Population(...) selector = stk.FilterMolecules( filter=stk.Best(num_batches=5, batch_size=3), selector=stk.Roulette(num_batches=20, batch_size=3), ) for batch in selector.select(population): # Do stuff with batch. All the molecules in the batch # belong to the top 5 batches of size 3. The batch # was selected using roulette selection.
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, filter, selector)¶ Initialize a
FilterMoleculesinstance.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
RemoveBatches(remover, selector)¶ Bases:
stk.calculators.ea.selectors.SelectorPrevents a
Selectorfrom selecting some batches.Examples
You want to use
Rouletteselection on all but the 5Worstbatchesimport stk population = stk.Population(...) selector = stk.RemoveBatches( remover=stk.Worst(5), selector=stk.Roulette(20), ) for batch in selector.select(population): # Do stuff with batch. It was selected with roulette # selection and is not one of the worst 5 batches.
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, remover, selector)¶ Initialize a
RemoveBatchesinstance.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
RemoveMolecules(remover, selector)¶ Bases:
stk.calculators.ea.selectors.SelectorPrevents a
Selectorfrom selecting some molecules.Examples
You want to prevent any of the molecules in the
Best5 batches from being selected byRoulette.import stk population = stk.Population(...) selector = stk.RemoveMolecules( remover=stk.Best(num_batches=5, batch_size=3), selector=stk.Roulette(num_batches=20, batch_size=3), ) for batch in selector.select(population): # Do stuff with batch. The batch is guaranteed not to # contain any molecules which are found in the best 5 # batches of size 3.
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, remover, selector)¶ Initialize a
RemoveMoleculesinstance.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
Roulette(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector,stk.calculators.ea.selectors.SelectorUses roulette selection to select batches of molecules.
In roulette selection the probability a batch is selected is given by its fitness. If the total fitness is the sum of all fitness values, the chance a batch is selected is given by:
p = batch fitness / total fitness,
where
pis the probability of selection and the batch fitness is the sum of all fitness values of molecules in the batch 1.References
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. roulette = stk.Roulette() # Select the molecules. for selected, in roulette.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Yielding multiple molecules at once. For example, if molecules need to be selected for crossover
# Make a population holding some molecules. pop = stk.Population(...) # Make the selector. roulette = stk.Roulette(batch_size=2) # Select the molecules. for selected in roulette.select(pop): # selected is a tuple of length 2, holding the selected # molecules. You can do stuff with the selected molecules # Like apply crossover operations on them. offspring = list(crosser.cross(*selected))
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Initialize a
Rouletteinstance.- Parameters
num_batches (
int, optional) – The number of batches to yield. IfNonethen yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int, optional) – The number of molecules yielded at once.duplicate_mols (
bool, optional) – IfTruethe same molecule can be yielded in more than one batch.duplicate_batches (
bool, optional) – IfTruethe same batch can be yielded more than once.fitness_modifier (
callable, optional) – Takes the population on whichselect()is called and returns adictmapping molecules in the population to the fitness values theSelectorshould use. IfNonethenEAPopulation.get_fitness_values()is used.random_seed (
int, optional) – The random seed to use.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
Selector¶ Bases:
stk.calculators.base_calculators.CalculatorAn abstract base class for selectors.
Selectors select batches of molecules from a population. Each batch is selected based on its fitness. The fitness of a batch is the sum of all fitness values of the molecules in the batch. Batches may be of size 1.
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, /, *args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
StochasticUniversalSampling(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector,stk.calculators.ea.selectors.SelectorYields batches of molecules through stochastic universal sampling.
Stochastic universal sampling lays out batches along a line, with each batch taking up length proportional to its fitness. It then creates a set of evenly spaced pointers to different points on the line, each of which is occupied by a batch. Batches which are pointed to are yielded.
This approach means weaker members of the population are given a greater chance to be chosen than in
Rouletteselection 2.References
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. stochastic_sampling = stk.StochasticUniversalSampling(5) # Select the molecules. for selected, in stochastic_sampling.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Initialize a
StochasticUniversalSamplinginstance.- Parameters
num_batches (
int, optional) – The number of batches to yield. IfNonethen yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int, optional) – The number of molecules yielded at once.duplicate_mols (
bool, optional) – IfTruethe same molecule can be yielded in more than one batch.duplicate_batches (
bool, optional) – IfTruethe same batch can be yielded more than once.fitness_modifier (
callable, optional) – Takes the population on whichselect()is called and returns adictmapping molecules in the population to the fitness values theSelectorshould use. IfNonethenEAPopulation.get_fitness_values()is used.random_seed (
int, optional) – The random seed to use.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
Tournament(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector,stk.calculators.ea.selectors.SelectorYields batches of molecules through tournament selection.
In tournament selection, a random number of batches is chosen from the population undergo a competition. In each competition, the batch with the highest fitness value is yielded. This is repeated until num_batches are yielded.
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. tournament = stk.Tournament( num_batches=5, batch_size=1 ) # Select the molecules. for selected, in tournament.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Initialize a
Tournamentinstance.- Parameters
num_batches (
int, optional) – The number of batches to yield. IfNonethen yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int, optional) – The number of molecules yielded at once.duplicate_mols (
bool, optional) – IfTruethe same molecule can be yielded in more than one batch.duplicate_batches (
bool, optional) – IfTruethe same batch can be yielded more than once.fitness_modifier (
callable, optional) – Takes the population on whichselect()is called and returns adictmapping molecules in the population to the fitness values theSelectorshould use. IfNonethenEAPopulation.get_fitness_values()is used.random_seed (
int, optional) – The random seed to use.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-
-
class
Worst(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector,stk.calculators.ea.selectors.SelectorSelects batches of molecules, lowest fitness value first.
Examples
Select the worst 5 batches of size 3
import stk population = stk.Population(...) worst = stk.Worst(num_batches=5, batch_size=3) for batch in worst.select(population): # Do stuff with batch.
Methods
select(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Initialize a
Worstinstance.- Parameters
num_batches (
int, optional) – The number of batches to yield. IfNonethen yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int, optional) – The number of molecules yielded at once.duplicate_mols (
bool, optional) – IfTruethe same molecule can be yielded in more than one batch.duplicate_batches (
bool, optional) – IfTruethe same batch can be yielded more than once. Duplicate batches can occur if the same molecule is found multiple times in a population.fitness_modifier (
callable, optional) – Takes the population on whichselect()is called and returns adictmapping molecules in the population to the fitness values theSelectorshould use. IfNonethenEAPopulation.get_fitness_values()is used.
-
select(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation) – A collection of molecules from which batches are selected.included_batches (
set, optional) – The identity keys of batches which are allowed to be yielded, ifNoneall batches can be yielded. If notNoneonly batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None, no batch is forbidden from being yielded.
- Yields
-