Selection¶
Selection is carried out by Selector
objects.
Selectors are objects with a select()
method, which is used to select batches of molecules from an
EAPopulation
. Examples of how Selector
classes can
be used is given their documentation, for example Best
,
:class:.`Roulette` or AboveAverage
.
Selectors can be combined to generate more complex selection
processes. For example, let’s say we want to implement elitism.
Elitism is when the best batches are guaranteed to be selected first,
before the selection algorithm is carried out. The
Sequence
exists precisely for this reason. It takes
two selectors and yields batches from them, one after the other
import stk
population = stk.EAPopulation(...)
elite_roulette = stk.Sequence(
stk.Best(5),
stk.Roulette(20),
)
# Select with Best first and then with Roulette.
for batch in elite_roulette.select(population):
# Do stuff with batch.
What if you did not want Roulette to yield any batches
selected by Best
? The RemoveBatches
selector can be
used for this. It takes two selectors, one called a remover and one
called a selector. It first yields batches of molecules from a
population with the remover. It then passes the same population
to the selector but prevents it from yielding any batches
selected by the remover
roulette_without_elites = stk.RemoveBatches(
remover=stk.Best(5),
selector=stk.Roulette(20),
)
# Select batches, excluding the top 5.
for batch in roulette_without_elites.select(population):
# Do stuff with batch.
You can combine RemoveBatches
and Sequence
to get a selector which yields the top 5 batches first and then,
using roulette, selects any of the other batches
elite_roulette2 = stk.Sequence(
stk.Best(5),
roulette_without_elites,
)
The same thing can be written more explicitly
elite_roulette2 = stk.SelectorSequence(
stk.Best(5),
stk.RemoveBatches(
remover=stk.Best(5),
selector=stk.Roulette(20),
),
)
You can also explore other combinations with FilterBatches
,
FilterMolecules
and RemoveMolecules
. Examples
using these classes are given in their docstrings.
Making New Selectors¶
When a new Selector
class is made it must inherit
Selector
and implement any virtual methods.
-
class
AboveAverage
(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector
,stk.calculators.ea.selectors.Selector
Yields above average batches of molecules.
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. above_avg = stk.AboveAverage() # Select the molecules. for selected, in above_avg.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Yielding multiple molecules at once. For example, if molecules need to be selected for crossover.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. above_avg = stk.AboveAverage(batch_size=2) # Select the molecules. for selected in above_avg.select(pop): # selected is a tuple of length 2, holding the selected # molecules. You can do stuff with the selected molecules # Like apply crossover operations on them. offspring = list(crosser.cross(*selected))
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Initialize an
AboveAverage
instance.- Parameters
num_batches (
int
, optional) – The number of batches to yield. IfNone
then yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int
, optional) – The number of molecules yielded at once.duplicate_mols (
bool
, optional) – IfTrue
the same molecule can be yielded in more than one batch.duplicate_batches (
bool
, optional) – IfTrue
the same batch can be yielded more than once.fitness_modifier (
callable
, optional) – Takes the population on whichselect()
is called and returns adict
mapping molecules in the population to the fitness values theSelector
should use. IfNone
thenEAPopulation.get_fitness_values()
is used.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
Batch
(mols, fitness_values)¶ Bases:
object
Represents a batch of molecules.
Batches can be compared, the comparison is based on their fitness values. Batches can also be iterated through, this iterates through all the molecules in the batch.
Examples
Sorting batches causes them to be sorted by fitness value.
batches = (Batch(...), Batch(...), Batch(...)) sorted_batches = sorted(batches)
Comparison is also based on fitness value
batch1 = Batch(...) batch2 = Batch(...) if batch1 > batch2: print('batch1 has a larger fitness value than batch2.')
Batches can be iterated through to get the molecules in the batch
batch = Batch(...) for mol in batch: # Do stuff with mol.
Methods
get_fitness
(self)Get the fitness value of the batch.
get_identity_key
(self)Get the identity key of the batch.
get_size
(self)Get the number of molecules in the batch.
-
__init__
(self, mols, fitness_values)¶ Initialize a
Batch
.- Parameters
mols (
tuple
ofMolecule
) – The molecules which are part of the batch.fitness_values (
dict
) – Maps each molecule in mols to its fitness value.
-
get_fitness
(self)¶ Get the fitness value of the batch.
- Returns
The fitness value.
- Return type
float
-
get_identity_key
(self)¶ Get the identity key of the batch.
If two batches hold the same molecules, the same number of times, they will have the same identity key.
- Returns
A hashable object which can be used to compare if two batches are the same.
- Return type
object
-
get_size
(self)¶ Get the number of molecules in the batch.
- Returns
The number of molecules in the batch.
- Return type
int
-
-
class
Best
(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector
,stk.calculators.ea.selectors.Selector
Selects batches of molecules, highest fitness value first.
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. best = stk.Best() # Select the molecules. for selected, in best.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Yielding multiple molecules at once. For example, if molecules need to be selected for crossover.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. best = stk.Best(batch_size=2) # Select the molecules. for selected in best.select(pop): # selected is a tuple of length 2, holding the selected # molecules. You can do stuff with the selected molecules # Like apply crossover operations on them. offspring = list(crosser.cross(*selected))
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Initialize a
Best
instance.- Parameters
num_batches (
int
, optional) – The number of batches to yield. IfNone
then yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int
, optional) – The number of molecules yielded at once.duplicate_mols (
bool
, optional) – IfTrue
the same molecule can be yielded in more than one batch.duplicate_batches (
bool
, optional) – IfTrue
the same batch can be yielded more than once. Duplicate batches can occur if the same molecule is found multiple times in a population.fitness_modifier (
callable
, optional) – Takes the population on whichselect()
is called and returns adict
mapping molecules in the population to the fitness values theSelector
should use. IfNone
thenEAPopulation.get_fitness_values()
is used.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
FilterBatches
(filter, selector)¶ Bases:
stk.calculators.ea.selectors.Selector
Allows a
Selector
to select only some batches.Examples
You only want the
Best
10 batches to participate inRoulette
import stk population = stk.Population(...) selector = stk.FilterBatches( filter=stk.Best(10), selector=stk.Roulette(7), ) for batch in selector.select(population): # Do stuff with batch. It is one of the 10 best batches and # was selected using roulette selection.
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, filter, selector)¶ Initialize a
FilterBatches
instance.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
FilterMolecules
(filter, selector)¶ Bases:
stk.calculators.ea.selectors.Selector
Allows a
Selector
to select only some molecules.Examples
You want to use
Roulette
on the molecules which belong to theBest
5 batches of size 3import stk population = stk.Population(...) selector = stk.FilterMolecules( filter=stk.Best(num_batches=5, batch_size=3), selector=stk.Roulette(num_batches=20, batch_size=3), ) for batch in selector.select(population): # Do stuff with batch. All the molecules in the batch # belong to the top 5 batches of size 3. The batch # was selected using roulette selection.
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, filter, selector)¶ Initialize a
FilterMolecules
instance.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
RemoveBatches
(remover, selector)¶ Bases:
stk.calculators.ea.selectors.Selector
Prevents a
Selector
from selecting some batches.Examples
You want to use
Roulette
selection on all but the 5Worst
batchesimport stk population = stk.Population(...) selector = stk.RemoveBatches( remover=stk.Worst(5), selector=stk.Roulette(20), ) for batch in selector.select(population): # Do stuff with batch. It was selected with roulette # selection and is not one of the worst 5 batches.
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, remover, selector)¶ Initialize a
RemoveBatches
instance.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
RemoveMolecules
(remover, selector)¶ Bases:
stk.calculators.ea.selectors.Selector
Prevents a
Selector
from selecting some molecules.Examples
You want to prevent any of the molecules in the
Best
5 batches from being selected byRoulette
.import stk population = stk.Population(...) selector = stk.RemoveMolecules( remover=stk.Best(num_batches=5, batch_size=3), selector=stk.Roulette(num_batches=20, batch_size=3), ) for batch in selector.select(population): # Do stuff with batch. The batch is guaranteed not to # contain any molecules which are found in the best 5 # batches of size 3.
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, remover, selector)¶ Initialize a
RemoveMolecules
instance.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
Roulette
(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector
,stk.calculators.ea.selectors.Selector
Uses roulette selection to select batches of molecules.
In roulette selection the probability a batch is selected is given by its fitness. If the total fitness is the sum of all fitness values, the chance a batch is selected is given by:
p = batch fitness / total fitness,
where
p
is the probability of selection and the batch fitness is the sum of all fitness values of molecules in the batch 1.References
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. roulette = stk.Roulette() # Select the molecules. for selected, in roulette.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Yielding multiple molecules at once. For example, if molecules need to be selected for crossover
# Make a population holding some molecules. pop = stk.Population(...) # Make the selector. roulette = stk.Roulette(batch_size=2) # Select the molecules. for selected in roulette.select(pop): # selected is a tuple of length 2, holding the selected # molecules. You can do stuff with the selected molecules # Like apply crossover operations on them. offspring = list(crosser.cross(*selected))
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Initialize a
Roulette
instance.- Parameters
num_batches (
int
, optional) – The number of batches to yield. IfNone
then yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int
, optional) – The number of molecules yielded at once.duplicate_mols (
bool
, optional) – IfTrue
the same molecule can be yielded in more than one batch.duplicate_batches (
bool
, optional) – IfTrue
the same batch can be yielded more than once.fitness_modifier (
callable
, optional) – Takes the population on whichselect()
is called and returns adict
mapping molecules in the population to the fitness values theSelector
should use. IfNone
thenEAPopulation.get_fitness_values()
is used.random_seed (
int
, optional) – The random seed to use.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
Selector
¶ Bases:
stk.calculators.base_calculators.Calculator
An abstract base class for selectors.
Selectors select batches of molecules from a population. Each batch is selected based on its fitness. The fitness of a batch is the sum of all fitness values of the molecules in the batch. Batches may be of size 1.
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, /, *args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
StochasticUniversalSampling
(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector
,stk.calculators.ea.selectors.Selector
Yields batches of molecules through stochastic universal sampling.
Stochastic universal sampling lays out batches along a line, with each batch taking up length proportional to its fitness. It then creates a set of evenly spaced pointers to different points on the line, each of which is occupied by a batch. Batches which are pointed to are yielded.
This approach means weaker members of the population are given a greater chance to be chosen than in
Roulette
selection 2.References
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. stochastic_sampling = stk.StochasticUniversalSampling(5) # Select the molecules. for selected, in stochastic_sampling.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Initialize a
StochasticUniversalSampling
instance.- Parameters
num_batches (
int
, optional) – The number of batches to yield. IfNone
then yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int
, optional) – The number of molecules yielded at once.duplicate_mols (
bool
, optional) – IfTrue
the same molecule can be yielded in more than one batch.duplicate_batches (
bool
, optional) – IfTrue
the same batch can be yielded more than once.fitness_modifier (
callable
, optional) – Takes the population on whichselect()
is called and returns adict
mapping molecules in the population to the fitness values theSelector
should use. IfNone
thenEAPopulation.get_fitness_values()
is used.random_seed (
int
, optional) – The random seed to use.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
Tournament
(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector
,stk.calculators.ea.selectors.Selector
Yields batches of molecules through tournament selection.
In tournament selection, a random number of batches is chosen from the population undergo a competition. In each competition, the batch with the highest fitness value is yielded. This is repeated until num_batches are yielded.
Examples
Yielding molecules one at a time. For example, if molecules need to be selected for mutation or the next generation.
import stk # Make a population holding some molecules. pop = stk.Population(...) # Make the selector. tournament = stk.Tournament( num_batches=5, batch_size=1 ) # Select the molecules. for selected, in tournament.select(pop): # Do stuff with each selected molecule, like apply a # mutation to it to generate a mutant. mutant = mutator.mutate(selected)
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None, random_seed=None)¶ Initialize a
Tournament
instance.- Parameters
num_batches (
int
, optional) – The number of batches to yield. IfNone
then yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int
, optional) – The number of molecules yielded at once.duplicate_mols (
bool
, optional) – IfTrue
the same molecule can be yielded in more than one batch.duplicate_batches (
bool
, optional) – IfTrue
the same batch can be yielded more than once.fitness_modifier (
callable
, optional) – Takes the population on whichselect()
is called and returns adict
mapping molecules in the population to the fitness values theSelector
should use. IfNone
thenEAPopulation.get_fitness_values()
is used.random_seed (
int
, optional) – The random seed to use.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-
-
class
Worst
(num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Bases:
stk.calculators.ea.selectors._BatchingSelector
,stk.calculators.ea.selectors.Selector
Selects batches of molecules, lowest fitness value first.
Examples
Select the worst 5 batches of size 3
import stk population = stk.Population(...) worst = stk.Worst(num_batches=5, batch_size=3) for batch in worst.select(population): # Do stuff with batch.
Methods
select
(self, population[, included_batches, …])Select batches of molecules from population.
-
__init__
(self, num_batches=None, batch_size=1, duplicate_mols=True, duplicate_batches=True, fitness_modifier=None)¶ Initialize a
Worst
instance.- Parameters
num_batches (
int
, optional) – The number of batches to yield. IfNone
then yielding will continue forever or until the generator is exhausted, whichever comes first.batch_size (
int
, optional) – The number of molecules yielded at once.duplicate_mols (
bool
, optional) – IfTrue
the same molecule can be yielded in more than one batch.duplicate_batches (
bool
, optional) – IfTrue
the same batch can be yielded more than once. Duplicate batches can occur if the same molecule is found multiple times in a population.fitness_modifier (
callable
, optional) – Takes the population on whichselect()
is called and returns adict
mapping molecules in the population to the fitness values theSelector
should use. IfNone
thenEAPopulation.get_fitness_values()
is used.
-
select
(self, population, included_batches=None, excluded_batches=None)¶ Select batches of molecules from population.
- Parameters
population (
EAPopulation
) – A collection of molecules from which batches are selected.included_batches (
set
, optional) – The identity keys of batches which are allowed to be yielded, ifNone
all batches can be yielded. If notNone
only batches included_batches will be yielded.excluded_batches (class:set, optional) – The identity keys of batches which are not allowed to be yielded. If
None
, no batch is forbidden from being yielded.
- Yields
-