Genetic Algorithms deal with strings representing the parameter set.
Initially a problem is modeled such that features of the domain are defined that can be assigned values representing a solution to the problem.
These features are joined together as contiguous elements of a data structure (a string).
The values assigned to these elements represent one instance of solving the problem.
Multiple attempts to solve the problem are created and the resulting data structures are grouped as a population.
The strings are analogous to the DNA found in living organisms.
In the computer, the strings are usually reduced to binary strings.
E.g., The following binary string represents three parameters that could
be separate individual adjustments on a piece of machinery:
The population size should be large enough to guarantee a diverse gene pool.
The initial generation is usually determined randomly.
A method to determine how well the values of a given structure solve the
problem is defined:
Certain parameters must be assigned values when using GA's:
generation gap: number of members of population reproduced during
a cycle of applying GA's
mutation rate: the probability of a given element of a structure
being changed during a cycle
These variables do not change the basic operations but their values influence
the overall performance of the running system.
A simple GA operates by creating an initial set, or population of
solution attempts, and repeatedly evaluating and applying genetic operators
that alter the population.
Each cycle of evaluation and alteration is termed a generation of the
population.
This process continues until either the evaluation finds a predetermined level
of success or the search is terminated.
Finding the largest decimal
GA's combine "survival-of-the-fittest" reproduction strategies with methods of
information exchange based on genetic crossover and random alteration derived
from biological mutation.
These operations
Repeated cycles of applying these operators (generations) result in a
population of structures with improved performance (higher fitness
values).
Population members with higher fitness values get proportionally larger
representation in the next generation.
The next generation thus represents a population that generally fits the
environment better.
(Many variations)
(Rate (probability of crossover) is a Control Parameter)
Evaluate Population 2
From a software perspective, GA's can be viewed as follows:
Genetic Algorithms are best at optimization.
They can handle non-linear, multi-modal tasks that exhibit noise. ( Figure 1.1 nice , Figure 1.2 now what!)
more later ...
(what you need)
Problem Definition Input:
Learning Input:
Fitness Scaling
Premature convergence - a problem often discovered in small population
GA's.
In the first few generations, these members take over a large part of the
population before the crossover operator
is able to construct a more diverse set of good members.
Alternatively, late in the run, the population average fitness may become close
to the population best fitness - average members and best members get nearly
the same number of copies. (We need a healthy competition between near
equals.)
Fitness scaling avoids premature convergence by scaling down the few
extraordinary members while scaling up the lowly members of the population.
In linear scaling, we calculate the scaled fitness f' from the rawness
fitness (objective function value)
Where the coefficients a and b are chosen to
These two conditions ensure that average population members receive one
offspring copy on average and the best receive the specified multiple number of
copies.
Adjacent to the major operators of reproduction, crossover, and mutation used
in GA's, various parameters are required for the running of a genetic
algorithm. Examples of these variables are the Control Parameters mentioned
earlier.
Values for these parameters are generally set at the beginning of the run and
remain unchanged throughout.
Parameter optimization considers variations of these parameters for
optimal performance.
Recent research has considered dynamic adjustments of control parameters
(i.e., changing parameters while a GA is running to improve performance)
Most work has focused on what the initial settings should be and how
performance varies with respect to these settings in order to establish the
best static settings.
We will discuss these first.
The initial work on parameter optimization was performed by DeJong (1975).
To qualify the effectiveness of different genetic algorithms, he devised two
measures:
Off-line Performance measured the maximum fitness of the population. (convergence)
DeJong constructed a test environment of 5 problems in function minimization.
He tested 4 Control Parameters:
The results indicated optimal parameter settings of:
He made the following conclusion concerning population size,
DeJong performed many comparisons:
These types of studies continue in Genetic Algorithm research.
Another technique thought to help convergence is the use of gray codes.
Notice that adjacent integers map to bit strings which differ by a single
bit.
while if one uses gray-coding with the adjacency property, only small
perturbations are caused by many mutations.
Genetic Algorithms have been shown to be effective where past methods have had
difficulty:
GA's have not been applied extensively to combinatorial problems.
The major obstacle is in finding appropriate representations.
The Traveling Salesperson Problem is a good example of the
need for proper representation.
The Traveling Salesperson Problem:
Given a complete graph with N nodes, find the shortest Hamiltonian path through
the graph (i.e., visit each node only once)
Choose a representation:
Consider a path representation in which a tour is represented by a list of
cities: (a b c d e f).
Problems:
Get (a b c c b) and (a d e d e), neither are legal
Alternative crossover operators have been studied: [Oliver et al]
The Order Crossover
Example:
The offspring sequence, j k c (return to the first position)
The absolute positions of some elements are retained, and the relative
positions of some elements of the second parent are also kept.
Example:
The Cycle Crossover
The cycle crossover achieves: creating an offspring different from the parents
where every position is occupied by a corresponding element from one of the
parents.
We satisfy the following conditions:
b) The offspring must be a permutation
Given the following parents:
In a standard crossover operator, a random crossover point or cut section is
chosen; in the cycle crossover a random parent is chosen for each cycle.
Choosing cycle 1 from parent 1 and the rest from parent 2 in the above, we
get
The absolute positions of, on average, half the elements of both the first and
second parents are preserved.
Results:
Order crossover is superior in problems where compact schemata are important
(as in TSP).
Suspect PMX to be superior when compactness less important.
Cycle crossover potentially superior where compactness not relevant.
50 cities , 100 cities , 200 cities
The Classifier System is the most common genetics-based machine learning
(GBML) system.
Notes from [Goldberg, 1989, chap.6]
A classifier system is a machine learning system that learns syntactically
simple string rules (called classifiers) to guide its performance in an
arbitrary environment.
It consists of three main components:
review ?
if <condition> then <action>
computationally complete
In brief, two learning mechanisms are in action.
If activated, a classifier may then be chosen to send a message to the message
list for the next cycle.
These messages may then invoke other classifiers or they may cause an action
to be taken through the system's action triggers (called effectors).
Thus classifiers combine environmental cues and internal rules (external and
internal data) to guide its "state-of-mind" in the next cycle.
<condition> ::= {0, 1, #}l where # is a wildcard
- The action is a message that can be sent to the message list
Whether the candidate classifier posts its message is determined by the outcome
of an activation auction, which in turn depends on the evaluation of the
classifier's value.
Simple Example:
1) 0 1 # #: 0 0 0 0
2) 0 0 # 0: 1 1 0 0
3) 1 1 # #: 1 0 0 0
4) # # 0 0: 0 0 0 1
. .
First step: environmental message of 0111
matches classifier 1, posts message 0000
matches classifiers 2 & 4, posts messages 1100 & 0001
message 1100 matches classifier 3, posts message 1000
matches classifier 4, posts message 0001
process terminates
When message list is too large, how does one determine which classifiers to
activate? (conflict resolution)
Runtime2:
Step 1: Place all environmental messages on the current message list
Step 2: Compare all messages to all conditions of classifiers in
the classifier store, record all messages and erase the message list.
Step 3: For each match compute a bid based on the strength of the classifier and place only the messages of highest bidding classifiers
on the list.
Step 4: If the new message specifies some action to be taken, do so, but with resolution of effector conflicts if necessary.
Step 5: Return to step 1
Specifically, the bid produced by a classifier is proportional to the product
of its strength (past usefulness) and its specificity (its relevance (the
amount of info it has to contribute to the current situation))
Bid(C, t) = R(C) Strength(C, t)
The bid is calculated in the apportionment of credit subsystem
Suppose an action of the CS brings the environment to the goal state. The
system needs to adjust the strengths of the classifiers according to their
usefulness in reaching this state.
(Notice this is likely only if the consumer is, in turn, profitable -
producing a chain)
The profitability of other classifiers depends on if they are in a sequence
which leads to a payoff.
The BBA assures that early-acting, stage setting classifiers receive credit if
they on average make possible payoff-receiving actions.
Formally:
* A high-bidding classifier whose message is placed on the message list has its
strength reduced by an amount proportional to its bid (paying for the
priviledge of posting a new message)
Strength(C, t + 1) = Strength(C, t) - c Bid(C,t)
The classifiers {C'} that sent a message matched by this "winner" have their
strengths increased by an amount proportional to the bid of C (the senders are
rewarded for setting up the good situation usable by C.)
Strength(C', t + 1) = Strength(C', t) + c Bid(C,t)/n
Use the 4 classifier example used earlier and follow their "payments".
Assume: initial strengths of 200
Cbid = bid coefficient of 0.1
In this text, the bid is the bid coefficient times the strength
Bid(C, t) = Cbid * Strength(C, t)
The above is a simplified version of the algorithm.
See [Goldberg, 1989, chapter 6] for a complete description. Genetic Algorithms in Search, Optimization, and Machine
Learning
DeJong, G. (1981). Generalizations based on explanations. Proceedings of the Seventh
International Joint Conference on Artificial Intelligence. Vancouver, British
Columbia, Canada.
DeJong, G. and Mooney, R. (1986). Explanation-based learning: An alternative view.
Machine Learning 1, 2, 145-176.
DeJong, K. (1975). An Analysis of the Behavior of a Class of Genetic Adaptive Systems.
PhD thesis, University of Michigan.
Grefenstette, J., Gopal, R., Rosmaita, B. and Gucht, D. (1985). Genetic algorithms for the
traveling salesperson problem. In International Conference of Genetic Algorithms
and Their Applications, pp 160-168.
Goldberg, D. (1989). Genetic Algorithms in Search, Optimization, and Machine
Learning. Reading MA:Addison-Wesley.
Goldberg, D.E., Lingle, R. (1985) Alleles, Loci, and the Traveling Salesman problem.
Proc. International Conference on Genetic Algorithms and their Applications.
Holland, J. (1980). Adaptive algorithms for discovering and using general patterns in
growing knowledge bases. Policy Analysis and Information Systems, 4.
Oliver, I., Smith, D. and Holland, J. (1987). A study of permutation crossover operators
on the traveling salesperson problem. In Second International Conference of
Genetic Algorithms and Their Applications, pp 224-230, MIT, Cambridge, MA.
AI depot link to tutorials on Genetic Algorithms
Laird's notes ( pg. 118)
Example representation and analogy:
Because GA's work with the representations of the parameter set and not
directly with the parameters, binary representation offers the advantage of
simplicity without sacrificing search efficiency.
SETTING 1: 80% of full scale
SETTING 2: 10 feet per second
SETTING 3: 100 degrees centigrade
STRING: 1010000 0010100 1100100
\_____/ \______/\______/
80% 10 ft/sec 100 deg. C
Concatenating these three pieces into one long string would suffice to
represent that point in the search space.Initialization
Control Parameters:
population size: N = number of encoded structures
(Generation gap permits overlapping populations:
G = 1 nonoverlapping populations
0 < G < 1 overlapping populations
In overlapping populations N*G members are selected for further action)Techniques:
Example Problem:
Genetic Operators:
(Crossover and mutation force the algorithm to evaluate points in
the solution space that might not have been tried previously.)
Back to Example:
Reproduction:
Selection is based on fitness values;
Crossover:
The crossover operator chooses which members of the new generation to pair together and where to exchange portions of the strings
between them. (analogy: mixing genes)
(a value of 1 imples all pairs will be subjected to crossover)Mutation:
Mutation operates randomly to change an element of a string that has been
reproduced.
(Frequency of mutation is also a Control Parameter)
Summary
Summary
Applications:
(sometimes called "black-box optimization") Figure 1.6
Components of Genetic Algorithms
The input to a GA is the description of the black box functionality
(Control parameters). This includes specifying:
A GA generates and manages its own set of cases, hence there are no separate
learning case inputs.
Output Specifications:
The output of a GA is a set of inputs to the black box evaluation that
are optimal.
Genetic Operators
(Likely to contain the global maximum.)
Arises from the fact that at the beginning there may be some extraordinary
members in a population of mediocre ones.
Parameter Optimization
During periods of low overall fitness a different population size or crossover
probability or mutation rate might let the algorithm operate more efficiently
than at times of high overall fitness.
On-line Performance measured the average fitness of all members of the population. (ongoing)
Technique:
n = population size
by varying each individually and then in combinations on a given function.
pc = crossover probability
pm = mutation probability
G = generation gap
population size: 50 -100
crossover rate: 0.60
mutation rate: 0.001 source?
"Increasing the population size was shown to reduce the stochastic effects
[of random sampling on a finite population] and improve long-term performance
at the expense of slower initial response."
(types of crossover, etc)Gray Coding:
Compare this to 2's complement where numbers like
The claim is
-8 (1000) and 0 (0000) differ by just one bit
if one uses some sort of binary encoding, then the underlying mutation space
has a number of jump discontinuities,
Traveling Salesperson Problem
high-dimensionality
multimodality
discontinuity
noise
fix: fix the initial city
E.g.,
Cross ( a b c d e) and (a d e c b) between 3rd & 4th
Parent 1 h k c e f d b l a i g j
Parent 2 a b c d e f g h i j k l
Offspring d e f g h i b l a j k c
d e f g h i, is the second parent starting at the second cut point with the
elements of the crossover section of the first parent removed.The PMX
(Goldberg and Lingle's Partially Mapped Crossover PMX)
(we really mean swapping)
Parent 1 h k c e f d b l a i g j
Parent 2 a b c d e f g h i j k l
Offspring i g c d e f b l a j k h
( We have swapped b<-->g , l<-->h , a<-->i )
The absolute positions of some elements of both the first and the second
parents are preserved.
a) Every position of the offspring must retain a value found in the
corresponding position of the parent
Parent 1 h k c e f d b l a i g j
Parent 2 a b c d e f g h i j k l
The cycles are labeled by a number or are called unary (U)
Parent 1 h k c e f d b l a i g j
Parent 2 a b c d e f g h i j k l
Cycle label 1 2 U 3 3 3 4 1 1 1 2 1
Offspring h b c d e f g l a i k j
Parent 1 2 2 2 2 2 2 1 1 1 2 1
Genetics-Based Machine Learning
Alternatively: A classifier system is a
general purpose rule-based learning system.
Benefits:
"apportionment of credit via competition and rule
discovery using genetic algorithms form a reasonable
basis for constructing a machine learning system atop the computationally convenient and complete framework
of classifiers."
Learning Task:
Runtime:
Representation:
<message> ::= {0, 1}l
<classifier> ::= <condition>: <action>
- Each condition specifies a subset of the set of all messages
condition is also of length k
Four classifiers:
Index Classifier .
(the number of non #'s in the condition)
where R(C) is the specificity, equal to the number of non#'s in the
condition divided by the length of the condition
and
Strength(C,t) is the strength of C at cycle t.
Most prevalent method is the bucket brigade algorithm.
BBA
where C is a winning classifier and c<1 is a constant
where n is the number of members of {C'}.
The Bucket brigade algorithm is an example of a very simple learning from
experience algorithm.
Bid(C, t) = 0.1 * 200 = 20
and sends its message during the next time step.
Bibliography
Appleby, C. (1990). Dynamic Population Sizing in Genetic Algorithms. Masters Thesis,
California State University, Chico. Chico, CA.
More Sources