Chapter 12

Games

Why

  • Games challenge intelligent people

  • The rules are well defined

  • Performance can be easily measured

  • Games can be models of real life, e.g. Politics

    Why we are interested in games

  • Game-Playing has shown that problems of intelligence with some complexity are doable by machine

  • Games are the only field of AI in which programs perform on par with humans

  • Games possess a number of interesting subprograms

    Some Games

    Not Puzzles

    Most of what follows will be on Chess. The Ideas and Techniques apply to all Games (and even outside of games).

    Example of an old approach

    UNIX Mastermind

    1. List all possible codes
    2. Randomly choose the first guess
    3. Eliminate codes which could not be right (Based on opponent's reply).
    4. Choose the first of the remaining codes for the next guess
    5. If not done then go to 3.

    Maximum of 8 colors, 5 positions

    Why does this work?

    What's wrong with it?

    Game Trees

    Games are conveniently represented as game trees

    Given the initial state (e.g. Starting board position), the game rules describe how to generate alternative lines of play to a conclusion.

    Example of (Part of) a game tree --

    Tic-Tac-Toe (Aka naughts and crosses)

    Ply of a game tree

    has P levels (Including root)

    Depth is D then P=D+1

    Branching factor

    Look ahead

    10 moves, 10 ply

    Size of a game tree is determined by

    Average number choices from each position (Branching Factor)

    Length of Game (Depth)

    => # alternative lines of Play = B.F. Depth. (Read page 307)

    Some Examples
    B.F. Ave. Length Lines Of Play
    Tic-Tac-Toe 4 6-7 104
    Checkers 6 60 1040
    Chess 30 80 10120
    Go 200    
    Backgammon 800    

    So it is impossible to search through interesting games to the end.

    Note: even Tic-Tac-Toe Becomes impossibly large with minor changes (4x4, cubic, probabilistic)

    Generate and Test

    MiniMax - apply plausible-move generator

    - apply sef --> min is opponent's best (our worst) so probably their choice)

    ... However many ply ... (time allowed)

    depth-first / depth limited

    "goodness" rests on the translation of board "goodness" into a single number - a number says nothing about how it was determined ... learning?

    Minimaxing

    The term comes from the Min-Max theorem of Von Neumann (~1925)

    Intuitively

    Minimax as an algorithm can be iterated up a game tree from terminal nodes.

    Note - Minimax assumes rational play by the opponent. in complete game trees, irrational play by the opponent leads to better positions. With incomplete information this is not always true.

    Alpha-Beta Procedure .. Prunes Game Tree

    MiniMax

    Alpha-Beta

    Examples

    ab Pruning

    Based on the following observations -

    a = Max, best known so far to us

    b = min, best known so far to them

    Notes:

    No one seems to know who invented ab. The name is due to McCarthy.

    Analysis of a b

    Worst case: branch ordered so Alpha-beta does nothing

    Best case: Suppose tree ordered with each best move being the leftmost alternative at each node

    Need not consider all of the opponent's replies to alternatives - but DO deal with all options of moving player

    Why? ... if an opponent has some responses that make a move bad no matter how the moving player continues, then the move must be bad.

    Learning

    MINIMAX ... and thus ALPHA-BETA ...

    rest on translation of board "goodness" into a single number

    a number says nothing about how it was determined ... or how to improve ...

    Shannon (1950)

    All Game programs can trace their basic ideas to Shannon

    First to

    The central problem is to choose the best of several alternatives.

    Shannon observed that human players: (Why do we care)

    1. Can evaluate a position as good for one player or the other

    2. Look ahead a few moves to see how the value of the position changes along various lines of play.

    Once this strategy is adopted, Other decisions are:

    Type A:

    1. Follow all continuations to a fixed depth
    2. Polynomial evaluation of the terminals (called static evaluation. Terms are features)
    3. Minimax
    4. Choose the first level alter. with the highest value.
    Example of a Static Eval function (Shannon's):

    S = 200(K-K') + 9(Q-Q') + 5(R-R') + 3(B-B'+N-N') + (P-P') - .5(D-D'+S-S'+I-I') + .1(M-M')

    K,Q,R,B,N,P = MaterialM = Legal Moves
    D = Double Pawns' indicates opponent
    S = Backward Pawns + => good for machine
    I = Isolated Pawns- => bad for machine

    Type B:

    Same as type A except:

    1. Terminate Look-ahead only if beyond a fixed minimum depth and position is quiescent

    2. Follow only most promising moves (Known as forward pruning)

    Quiescent - feedover conditions (AKA - "Dead" Positions)

    Board positions in the middle of exchanges, captures, checks, and so on cannot be accurately evaluated statically. These positions should be followed up. Feedover conditions define non-quiescence. (i.e. Search continues if feedover conditions are met)

    Type C:

    Same as type B except moves are generated which are relevant to a goal (e.g. get out of check)

    Shannon's algorithm:

    Type A with 2 move Look-ahead (program's choices) with possible replies) using the evaluation function above.

    Turing (1953)

    The first program, hand simulated against a human opponent. Done independently of Shannon.

    Type B program, 2 move Look-ahead, examine certain special positions beyond that. Evaluation was material plus some Tie-breakers.

    Los Alamos (1957)

    First working program

    Played Mini-chess:

    Type A algorithm, 4-ply Look-ahead, Material and Mobility in evaluation function

    Bernstein (1958)

    First full chess program

    Type B algorithm, 4-ply Look-ahead considering 7 most plausible moves from each position, scored using material, Mobility, area control, and king defense.

    Forward pruning is accomplished by decision routines which answer questions like:

    Newell,Shaw, and Simon (1959) GPS

    First to be written in a high-level language (1 ply)

    First to use pruning

    Type C algorithm, based on psychological studies of human chess play.

    The program contains a number of goals with a predefined order of importance. e.g. King safety, Material balance, center control, etc.

    Each goal is associated with

    The program also contains many chess situations, each associated with a set of appropriate goals.

    An acceptance level is set before a games. The first move found to have a value above this is chosen.

    Process --

    1. The present board position is evaluated to determine which situation applies
    2. The goals for the situation are put on the goal list.
    3. Choose the top goal
    4. Generate a move
    5. Generate continuations until dead positions are reached along all lines of play (Dead <= Does not evoke any of the goals from 2).
    6. If move's value > threshold then return move, else if E more moves then go to 4, else go to 3.
    These early programs played mediocre chess.

    Apparently none ever won a game

    ( <= no wins were published)

    Times to move varied from 12 minutes (for Los Alamos) to 10 hours (for NSS). So they could not compete in tournaments.

    Horizon Effect

    Some undesirable, but unavoidable, events is postponed (possibly at some cost) until the limit of Look-ahead is reached.

    e.g. sacrifice a bishop to avoid losing a queen, but the queen is later lost.

    Continuing to dead positions helps.

    Deeper search helps

    Real problem is that evaluation is inadequate.

    * In general, good evaluation beats deep search. (Proven in recent othello tournaments in which all play is brute force.)

    Greenblatt (1966) (Aka MacHack G)

    First to compete respectably with humans.

    Tapered forward pruning using plausibility ordering, , Hash coding, secondary search.

    Forward pruning was to best 15, 15, 9, 9, 7 moves at plies 1, 2, 3, 4, 5+. The width might be exceeded under certain conditions (e.g. checks and captures).

    Look-ahead to 4-ply plus feedover.

    Also uses a 2 ply plus feedover secondary search from the bottom of the best path. This is used to increase confidence in the best path.

    Plausible move generator

    Plausibility evaluation is based on moves not positions.

    e.g. -- decides "This move is bad because it blocks my Bishop" Rather than "This position is bad because my bishop is blocked"

    Hash coding -- remembers positions to reduce search.

    Also uses book openings (~5000 Moves).

    Wilkins (1980)

    Plan-based play, mid-game only.

    Chess knowledge coded in the form of production rules (condition -> action).

    Conditions = chess game patterns

    Action = post concepts for use in construction plans (e.g. -- queen checked)

    The concepts found by the rules are analyzed to select a move sequence (plan) with most likely replies. The program follows its plan until the end or the opponent deviates.

    Search is done to verify that the selected move is good (10's -- 100's of nodes)

    Based on the observation that human players construct plans of 4-5 moves to accomplish same tactical goal.

    Greenblantt's program began more intense work on chess and games in general. Many used exactly the same techniques, faster algorithms, faster hardware.

    Dominant chess program throughout the 70's was northwestern's chess X.X (Now 4.7). Won most national and international computer tournaments.

    Chess 4.5 introduced iterative deepening

    search to 2-ply and return a move.

    then to 3-ply and return a move ...

    until a time limit is reach.

    Purpose is dynamic ordering of moves (To increase cutoffs)

    Chess 4.5 also had some Goal-seeking

    The initial position is analyzed and certain squares are identified where pieces should be. Positions are evaluated partly according to how well the goals have been achieved.

    Belle was developed at Bell Labs and has won numerous tournaments, including against humans. First program to be rated a chess master (~2350)

    Belle uses strictly brute-force methods, 8-ply + feedover Look-ahead, plausibility ordering, MiniMax

    Uses specially designed, parallel, "chess-machine" hardware. Capable of examining 30 million positions per move.

    Berliner's Backgammon Program (BKG 9.8)

    First computer program to defeat a human world champion (1979).

    Branching factor for backgammon (800) precludes reliance on search, requires more intelligent play -- use positional judgment. A problem with Chess is that even much of human judgment is in playing out alternatives.

    The program generates all legal moves from a given position. A static evaluation function is applied. Dest move is followed up to verify its value. (i.e. -- 1 ply Look-ahead with 1 ply secondary search).

    The static evaluation function is a very complex polynomial in which the coefficients vary non-linearly depending on the board position. These SNAC functions (Smoothness, Non-Linear, Application Coefficient) Represent Micro-strategies for accumulating or avoiding features.

    Samuel's Checkers Program

    Developed from 1947-1967 (or Later)

    Primarily interested in learning, particularly learning to evaluate board positions.

    Checkers is simpler than chess -- fewer pieces, less freedom of movement, many more forced moves -- so it is easier to concentrate on learning.

    This description is of later versions of the program.

    The program uses Look-ahead with plausibility ordering, forward pruning, MiniMax with .

    Moves with plausibility below a threshold are pruned, the threshold increases with depth.

    Initially a polynomial evaluation function was used, like those in chess, with features (like piece advantage, mobility, center control). As terms and their coefficients representing relative weights. The game was divided into 6 phases and a different function was used for each phase.

    The polynomial evaluation function has limits as a means of evaluating alternatives.

    Signature Tables

    A Function can be computed or memorized

    A signature table is a memorized static eval (like ROM) function. (Samuel used the term "Signature" to mean a set of values for the inputs = a row of the table.)

    Example of a signature table:

    Signature Signature Output
    -1 -1 -1
    -1 0 -2
    -1 +1 +1
    0 -1 -1
    0 0 +1
    0 +1 +1
    +1 -1 -1
    +1 0 +2
    +1 +1 +1

    Two 3-valued (-1, 0, 1)inputs possible

    5-valued (-2, -1, 0, 1, 2) outputs

                2nd Pram if 2nd pram /= 0

    Output = 2 x 1st Pram otherwise

    If There were 30 Features in the function, each 16-bit integers (as Samuel used) the table would be very big -- 64,00030.

    Samuel noticed that human experts make qualitative (symbolic) judgments about features and board positions rather than the quantitative judgments normally required by polynomial eval functions. This reduces the range of values to 3-15

    Samuel also knew that there would not be interaction among all the features. Features with potential interaction could be grouped into a single table. (mk k/m vs. kn)

    So the number of signature table entries can be held to a reasonable size using symbolic descriptions and appropriate groupings.

    Samuel's final signature table arrangement:

    The Layering was used to take the interaction between groups into account.

    Note that each signature represents a concept. The inputs interact to form a higher level statement about the board position. The concept is the set of interactions.

    The entire mechanism is a way of describing how low level features combine to symbolically measure the desirability of a situation. Like finding how well patient data matches a disease profile.

    Samuel explored two kinds of learning:

    1. Rote

    2. Generalization

    Rote learning was uninteresting -- Remember board positions to reduce search and increase virtual depth.

    Generalization was designed to improve the evaluation of alternatives based on experience. Two methods were used:

    1. Book games

    2. -- Play (Not related to pruning)

    Goal of Learning is to make the Value of the Current position equal to the value of the position at the end of the most desirable path.

    -- play was used to improve the polynomial evaluations. One version of the program, , continually changes its polynomial; another version, , does not change its polynomial. & play each other. If won enough, then used 's polynomial in the next game. If won too often, then largest coefficient of 's polynomial was set to 0.

    stored an initial board score and compared it to the backed-up score, difference = . If = 0 then OK, otherwise the polynomial is wrong. if > 0 then positive terms would get more weight and negative ones less. ect.

    There were 38 terms, 16 in the polynomial at one time. Based statistical analyses (e.g. -- how often terms contribute to the score) terms in the polynomial contribute to the score) terms in the polynomial are replaced from the 22 reserves.

    This lead to good, but unconventional, mid-game and end-game play, poor openings.

    it is a form of hill-climbing -- Like natural selection.

    Book games were used for both polynomial and signature table evaluation.

    There are many (1000's) published games of masters. The program plays through these gams, alternating sides. At each point the program uses its normal process to pick a move. The book move is compared to the program's choice, with changes being made to the evaluation function as necessary. Then the book move is made and the program switches sides.

    For polynomial evaluation the coefficient modification is based on the dame kind of statistical evidence as in -- play. Changes are made when enough evidence accumulates.

    For signature tables the output value for each signature is adjusted so it represents (roughly) the probability of that signature being the preferred one, according to the book. Again, changes are made as evidence accumulates.

    The measure of success here is % of time the program makes the book move. For polynomials this was only 26%, for signature tables it was 48%.

    The checker program could learn to play a fairly good game. Polynomial evaluation proved to be inadequate while. Signature tables worked much better at acquiring expert abilities. They enabled a symbolic measure of goodness to be made, including the interactions between features, and the building of complex concepts from simpler ones.

    Signature tables are not used in games by anyone else. Search with polynomial evaluation works well enough for most games.

    Things like signature tables are used in expert systems (Like MDX) though in different form.

    The basic game techniques are:

    Look-ahead 1950 Shannon
    Minimax 1950 Shannon
    Pruning 1959 NSS
    Plausibility ordering 1966 Greenblatt
    Forward pruning 1950 Shannon
    Feedover conditions 1950 Shannon
    Secondary search 1966 Greenblatt
    Static evaluation 1950 Shannon

    Human play?

    Closest to human play now is Wilkins' program but it is very slow.

    The problem of recognizing a tactical position and using it to guide search in a short enough time to be competitive may be insoluble without some perceptual processing of the actual chess board. (Note that this does not have to be "vision" but can be any highly parallel mechanism for "accessing the entire board at once.)

    Mastermind revisited

    Tree search is probably not good -- essentially that is the UNIX version (with the entire tree collapsed into a single list).

    There might be a set of criteria for the first (and subsequent) guess to maximize the information which might be received. Guesses could be generated to meet this criteria and chosen randomly.

    The sequence of guesses and answers can contribute to the knowledge used in constructing guesses.