Machine Learning

Genetic Algorithms

  • see also Laird's notes (around pg. 118)
    Genetic Operators, Traveling SalesPerson Problem, Classifier System
  • Overview

    Background

    Genetic Algorithms (GA's) are search algorithms based on the theory of evolution.

    GA's are a cross between two techniques: hill-climbing and random trial-and-error.

    Representation:

    Genetic Algorithms deal with strings representing the parameter set.

    Initially a problem is modeled such that features of the domain are defined that can be assigned values representing a solution to the problem.

    These features are joined together as contiguous elements of a data structure (a string).

    The values assigned to these elements represent one instance of solving the problem.

    Multiple attempts to solve the problem are created and the resulting data structures are grouped as a population.

    Example representation and analogy:

    The strings are analogous to the DNA found in living organisms.

    In the computer, the strings are usually reduced to binary strings.

    Because GA's work with the representations of the parameter set and not directly with the parameters, binary representation offers the advantage of simplicity without sacrificing search efficiency.

    E.g., The following binary string represents three parameters that could be separate individual adjustments on a piece of machinery:

    	SETTING 1: 80% of full scale
    	SETTING 2: 10 feet per second
    	SETTING 3: 100 degrees centigrade
    
    	STRING:	1010000	0010100	1100100
                    \_____/ \______/\______/
                      80%   10 ft/sec 100 deg. C
    
    Concatenating these three pieces into one long string would suffice to represent that point in the search space.

    The population size should be large enough to guarantee a diverse gene pool.

    Initialization

    The initial generation is usually determined randomly.

    A method to determine how well the values of a given structure solve the problem is defined:



    Control Parameters:

    Certain parameters must be assigned values when using GA's:

    These variables do not change the basic operations but their values influence the overall performance of the running system.

    Techniques:

    A simple GA operates by creating an initial set, or population of solution attempts, and repeatedly evaluating and applying genetic operators that alter the population.

    Each cycle of evaluation and alteration is termed a generation of the population.

    This process continues until either the evaluation finds a predetermined level of success or the search is terminated.

    Example Problem:

    Finding the largest decimal

    Genetic Operators:

    GA's combine "survival-of-the-fittest" reproduction strategies with methods of information exchange based on genetic crossover and random alteration derived from biological mutation.

    These operations

    Repeated cycles of applying these operators (generations) result in a population of structures with improved performance (higher fitness values).

    Back to Example:

    Reproduction:

    Crossover:

    Mutation:

    Evaluate Population 2

    Summary

    From a software perspective, GA's can be viewed as follows:

    1. A complex data structure (the population) composed of multiple records (a member).

    2. Each member contains a group of coded elements (the string) that represents an attempt to solve the problem defined.

    3. The population data structure is acted upon by functional operators (reproduction, crossover, and mutation) that alter the values coded in the string

    4. The goal is to produce members with strings that solve the problem better.
    Summary

    Applications:

    Genetic Algorithms are best at optimization.
                   (sometimes called "black-box optimization") Figure 1.6

    They can handle non-linear, multi-modal tasks that exhibit noise. ( Figure 1.1 nice , Figure 1.2 now what!)

    more later ...

    Components of Genetic Algorithms

    (what you need)

    Problem Definition Input:

    Learning Input:

    Output Specifications:

    Genetic Operators

    1. Crossover
    2. Mutation
    3. Fitness Scaling

    Fitness Scaling

    Premature convergence - a problem often discovered in small population GA's.

    Alternatively, late in the run, the population average fitness may become close to the population best fitness - average members and best members get nearly the same number of copies. (We need a healthy competition between near equals.)

    Fitness scaling avoids premature convergence by scaling down the few extraordinary members while scaling up the lowly members of the population.

    In linear scaling, we calculate the scaled fitness f' from the rawness fitness (objective function value)

    f' = af + b

    Where the coefficients a and b are chosen to

    These two conditions ensure that average population members receive one offspring copy on average and the best receive the specified multiple number of copies.



    Parameter Optimization

    Adjacent to the major operators of reproduction, crossover, and mutation used in GA's, various parameters are required for the running of a genetic algorithm. Examples of these variables are the Control Parameters mentioned earlier.

    Values for these parameters are generally set at the beginning of the run and remain unchanged throughout.

    Parameter optimization considers variations of these parameters for optimal performance.

    Recent research has considered dynamic adjustments of control parameters (i.e., changing parameters while a GA is running to improve performance)

    Most work has focused on what the initial settings should be and how performance varies with respect to these settings in order to establish the best static settings.

    We will discuss these first.

    The initial work on parameter optimization was performed by DeJong (1975).

    To qualify the effectiveness of different genetic algorithms, he devised two measures:



    Technique:

    DeJong constructed a test environment of 5 problems in function minimization.

    DeJong's Five Function Test Bed : Test Functions F1 and F2 , Test Functions F3 and F4 , Test Functions F5

    He tested 4 Control Parameters:

    by varying each individually and then in combinations on a given function.

    The results indicated optimal parameter settings of:

    He made the following conclusion concerning population size,

    DeJong performed many comparisons:

    Fig. 4.12 , Fig. 4.13 , Fig. 4.14 , Fig. 4.15 , Fig. 4.16 , Fig. 4.17

    These types of studies continue in Genetic Algorithm research.

    Gray Coding:

    Another technique thought to help convergence is the use of gray codes.

    Table 4.1

    Notice that adjacent integers map to bit strings which differ by a single bit.

    The claim is

    Traveling Salesperson Problem

    Genetic Algorithms have been shown to be effective where past methods have had difficulty:

    GA's have not been applied extensively to combinatorial problems.

    The major obstacle is in finding appropriate representations.

    The Traveling Salesperson Problem is a good example of the need for proper representation.

    The Traveling Salesperson Problem:

    Given a complete graph with N nodes, find the shortest Hamiltonian path through the graph (i.e., visit each node only once)

    Choose a representation:

    Consider a path representation in which a tour is represented by a list of cities: (a b c d e f).

    Problems:

    Alternative crossover operators have been studied: [Oliver et al]

    The Order Crossover

    1. Two cut points are chosen at random

    2. the section of the first parent between the first and second is copied whole to the offspring.

    3. The remaining places are filled using elements not occurring in the crossover section ... to do this, use the order the elements are found in the second parent after the second cut point

    Example:

    	Parent 1		h k c e f d 	b l a 	i g j
    	Parent 2		a b c d e f 	g h i   j k l
    
    	Offspring		d e f g h i	b l a	j k c
    

    The offspring sequence, j k c (return to the first position)
    d e f g h i, is the second parent starting at the second cut point with the elements of the crossover section of the first parent removed.

    The absolute positions of some elements are retained, and the relative positions of some elements of the second parent are also kept.

    The PMX

    (Goldberg and Lingle's Partially Mapped Crossover PMX)
    1. choose two cut points at random

    2. the cut out section defines a series of swapping operations to be performed on the second parent
      (we really mean swapping)

    Example:

    	Parent 1		h k c e f d 	b l a 	i g j
    	Parent 2		a b c d e f 	g h i	j k l
    
    	Offspring		i g c d e f	b l a	j k h
    
    	( We have swapped  b<-->g ,  l<-->h ,  a<-->i )

    The absolute positions of some elements of both the first and the second parents are preserved.

    The Cycle Crossover

    The cycle crossover achieves: creating an offspring different from the parents where every position is occupied by a corresponding element from one of the parents.

    We satisfy the following conditions:

    Given the following parents:

    	Parent 1		h k c e f d 	b l a 	i g j
    	Parent 2		a b c d e f 	g h i	j k l
    

    1. Consider position one, condition a) says the offspring must contain either h or a. Suppose we choose h.

    2. Using condition b), a cannot be chosen from parent 2 since that position (position 1) is now occupied by h. So a must be chosen from parent 1.

    3. Since a in parent 1 is above i in parent 2, we must choose i from parent 1, etc. Thus having chosen h from parent 2 we are forced to choose a, i, j, and l from parent 1.

    4. The positions of these elements form a cycle, choosing any of the positions from parent 1 or 2 forces the choice of the rest if the conditions hold.
    The cycles are labeled by a number or are called unary (U)

    	Parent 1		h k c e f d b l a i g j
    	Parent 2		a b c d e f g h i j k l
    
    	Cycle label	        1 2 U 3 3 3 4 1 1 1 2 1
    

    In a standard crossover operator, a random crossover point or cut section is chosen; in the cycle crossover a random parent is chosen for each cycle.

    Choosing cycle 1 from parent 1 and the rest from parent 2 in the above, we get

    	Offspring 	h b c d e f g l a i k j
    	Parent		1 2 2 2 2 2 2 1 1 1 2 1
    

    The absolute positions of, on average, half the elements of both the first and second parents are preserved.

    Results:

    Order crossover is superior in problems where compact schemata are important (as in TSP).

    Suspect PMX to be superior when compactness less important.

    Cycle crossover potentially superior where compactness not relevant.

    50 cities , 100 cities , 200 cities

    Genetics-Based Machine Learning

    The Classifier System is the most common genetics-based machine learning (GBML) system.

                                Notes from [Goldberg, 1989, chap.6]

    A classifier system is a machine learning system that learns syntactically simple string rules (called classifiers) to guide its performance in an arbitrary environment.

    It consists of three main components:

  • The rule and message system is a special kind of production system.

    review ?

  • Classifier systems restrict rules to a fixed-length representation

  • Classifier systems use parallel rule activation

  • The relative value of different rules is one of the key pieces of information that must be learned (not pre-set)

    In brief, two learning mechanisms are in action.

    Learning Task:

    1. modify the strength of classifiers as the CS accumulates experience. This credit-apportionment is tackled with a bucket brigade algorithm.

    2. generate plasible new classifiers. This task is done by a GA: using classifiers as genotypes and strength as fitness

    Runtime:

  • A CS receives environmental information through sensors called detectors.

  • This incoming message is decoded into some standard message -format (may be one or more messages).

  • These messages are posted to a finite-length message list where the messages can then activate the classifiers from the classifier store.

    If activated, a classifier may then be chosen to send a message to the message list for the next cycle.

    These messages may then invoke other classifiers or they may cause an action to be taken through the system's action triggers (called effectors).

    Figure 6.1

    Thus classifiers combine environmental cues and internal rules (external and internal data) to guide its "state-of-mind" in the next cycle.

    Representation:

  • Messages are bit strings of fixed length k

  • Each classifier consists of a condition and action part

    Whether the candidate classifier posts its message is determined by the outcome of an activation auction, which in turn depends on the evaluation of the classifier's value.

    Simple Example:

    		Four classifiers:
    
    	Index			Classifier              .

    1) 0 1 # #: 0 0 0 0

    2) 0 0 # 0: 1 1 0 0

    3) 1 1 # #: 1 0 0 0

    4) # # 0 0: 0 0 0 1

    . .

    First step: environmental message of 0111

    When message list is too large, how does one determine which classifiers to activate? (conflict resolution)

    Runtime2:

    Step 1: Place all environmental messages on the current message list

    Step 2: Compare all messages to all conditions of classifiers in the classifier store, record all messages and erase the message list.

    Step 3: For each match compute a bid based on the strength of the classifier and place only the messages of highest bidding classifiers on the list.

    Step 4: If the new message specifies some action to be taken, do so, but with resolution of effector conflicts if necessary.

    Step 5: Return to step 1

  • The bid depends on the classifier's strength and its specificity

  • There is usually a stochastic element in choosing the selection of the highest bidders - this ensures that each classifier can show its superiority

    The bid is calculated in the apportionment of credit subsystem

    Suppose an action of the CS brings the environment to the goal state. The system needs to adjust the strengths of the classifiers according to their usefulness in reaching this state.

  • Consider each classifier as a middleman in a competitive service economy

    The profitability of other classifiers depends on if they are in a sequence which leads to a payoff.

    The BBA assures that early-acting, stage setting classifiers receive credit if they on average make possible payoff-receiving actions.

    Formally:

    * A high-bidding classifier whose message is placed on the message list has its strength reduced by an amount proportional to its bid (paying for the priviledge of posting a new message)

    Strength(C, t + 1) = Strength(C, t) - c Bid(C,t)
                            where C is a winning classifier and c<1 is a constant

    The classifiers {C'} that sent a message matched by this "winner" have their strengths increased by an amount proportional to the bid of C (the senders are rewarded for setting up the good situation usable by C.)

    Strength(C', t + 1) = Strength(C', t) + c Bid(C,t)/n
                            where n is the number of members of {C'}.

    Use the 4 classifier example used earlier and follow their "payments".

    Table 6.3

    Assume: initial strengths of 200

    Cbid = bid coefficient of 0.1

    In this text, the bid is the bid coefficient times the strength

    Bid(C, t) = Cbid * Strength(C, t)

    The Bucket brigade algorithm is an example of a very simple learning from experience algorithm.

    The above is a simplified version of the algorithm.

    See [Goldberg, 1989, chapter 6] for a complete description. Genetic Algorithms in Search, Optimization, and Machine Learning

    Bibliography

    Appleby, C. (1990). Dynamic Population Sizing in Genetic Algorithms. Masters Thesis, California State University, Chico. Chico, CA.

    DeJong, G. (1981). Generalizations based on explanations. Proceedings of the Seventh International Joint Conference on Artificial Intelligence. Vancouver, British Columbia, Canada.

    DeJong, G. and Mooney, R. (1986). Explanation-based learning: An alternative view. Machine Learning 1, 2, 145-176.

    DeJong, K. (1975). An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD thesis, University of Michigan.

    Grefenstette, J., Gopal, R., Rosmaita, B. and Gucht, D. (1985). Genetic algorithms for the traveling salesperson problem. In International Conference of Genetic Algorithms and Their Applications, pp 160-168.

    Goldberg, D. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Reading MA:Addison-Wesley.

    Goldberg, D.E., Lingle, R. (1985) Alleles, Loci, and the Traveling Salesman problem. Proc. International Conference on Genetic Algorithms and their Applications.

    Holland, J. (1980). Adaptive algorithms for discovering and using general patterns in growing knowledge bases. Policy Analysis and Information Systems, 4.

    Oliver, I., Smith, D. and Holland, J. (1987). A study of permutation crossover operators on the traveling salesperson problem. In Second International Conference of Genetic Algorithms and Their Applications, pp 224-230, MIT, Cambridge, MA.


    More Sources

    AI depot link to tutorials on Genetic Algorithms

    Laird's notes ( pg. 118)