Machine Learning

III. Explanation-Based Learning

-Background

Figure 1 from [Kodratoff, 1988]

The main goal of inductive learning is the acquisition of new knowledge that did not exist in the knowledge implicitly before.

The goal of deductive learning is the improvement of existing knowledge into a form easier or more efficient to use.

e.g.GREEN(LEAF) <=> YOUNG(LEAF)

A deductive system may learn to recognize scenes with green leaves, instead of young ones, on the grounds that 'green' is easier to compute than 'young'.

The purpose of EBL (Explanation-Based Learning) is to use explanations to improve its performance.

Learning complex concepts by induction from + and - examples typically requires a substantial number of training instances.

People are able to learn quite a bit from single examples.

We don't need to see dozens of + and - examples to

draw conclusions.

What makes single-example learning possible?

-- knowledge --

Most recent work in machine learning has moved from the empirical, data-intensive approach using induction to the more analytical, knowledge intensive approach... relying on knowledge of the task domain and the concept under study.

An EBL system attempts to learn from a single example x by explaining why x is an example of the target concept.

That is, it analyzes the training example by first constructing an explanation of how the sample satisfies the definition of the concept under study.

The explanation is then generalized.

Specifically, the features of the example identified by this explanation are then used as the basis for formulating the general concept definition.

(note: this explanation provides for justification of the generalization)

The system's performance is improved through the availability of this knowledge.

SBL versus EBL

In the 1985 "International Workshop in Machine Learning"

the distinction between Similarity Based Learning (SBL) (last section) and Explanation Based Learning (EBL) (this section) was defined.

In SBL, one learns by detecting first similarities in a set of positive examples, secondly dissimilarities between positive and negative examples.

In EBL, the system points to the important piece of knowledge that has to be preserved.

A simple example of SBL and EBL: [Kodratoff, 1988]

Suppose we had a complete description of a hydrogen balloon with its dimensions, its color, the fact that it is being rained on, and the political context in which it was used.

An SBL system would ascertain that a red balloon rises in air, that a balloon does too, that a green balloon does too, etc... to conclude that the color has nothing to do with whether the balloon rises in air.

An EBL system, given a single example that a red balloon flies off, will seek to prove that it must rise. It will ascertain in the end that if the weight of the volume of air displaced is bigger than the weight of the balloon, then it must rise. The arguments will be about the weight of the balloon's envelope, the density of hydrogen, and the temperature and the degree of humidity in the air. It will conclude with certainty that the color and politics have nothing to do with the matter, and that, on the other hand, the data contained in the arguments are the significant descriptors for this problem.

The basic difference between the two classes of methods is that similarity-based methods must rely on some form of inductive bias (remember? ...what is this bias?) to guide generalization, whereas explanation-based methods provide a more reliable means of generalization, and are able to extract more information from individual training examples.

They also require that the learner possess knowledge of the domains and of the concept under study.

EBG is one approach (algorithm) for performing EBL.

EBL Paradigm

STRIPS (1972) is found at the roots of this research, but its revival, extension, and refinement must be credited independently to a number of researchers including Silver (1983), Mitchell (1983) and DeJong (1981).

Each of these researchers concurrently developed rather large explanation-based learning computer systems.

These projects all emphasized knowledge-based learning from a single example.

The development of explanation-based learning has now been a collaborative, evolutionary effort, marked by a gradual transition from exploratory research to more general and well-defined methods.

DeJong does not like the use of EBG as a general term for the research and prefers the broader term EBL to describe the approach. He points out how EBL can do more than just generalization.

For example, one could perform explanation-based refinement

as well (i.e., the explanation-based approach can be used to specialize over-generalized concepts.)

The explanation-based approach's roots can be traced back to the MACROPS learning method in STRIPS [Fikes et al., 1972]

Macro-Operators

Suppose you are faced with the problem of getting to the downtown post office. Your solution may involve getting in your car, starting it, and driving along a certain route.

Substantial planning may go into choosing the appropriate route; but you need not plan about how to start your car.

You are free to treat START-CAR as an atomic action, even though it really consists of several actions. Sequences of actions that can be treated as a whole are called macro-operators.

Macro-operators

The STRIPS macro-operator formation technique is perhaps the most influential precursor of present EBL. [Minton et al., 1989]

overview of STRIPS ...

Use of Macro-operators:

Suppose ON(C, B) and ON(A, Table) are both true

Picture:

start state goal state

C A

. B A B C .

STRIPS can achieve the goal of ON(A, B) by devising a plan with the four steps:

STRIPS now builds a MACROP

with preconditions: ON(C, B) and ON(A, Table)

and postconditions: ON(A, B) and ON(C, Table)

The body of the MACROP consists of the four steps.

In future planning, STRIPS can use this operator just like any other.

But ... Generalize:

By generalizing MACROPs before storing them, it can make efficient use of the knowledge it gained.

Simplist idea is to replace constants with variables:

with preconditions: ON(x1, x2) and ON(x3, Table)

and postconditions: ON(x3, x2) and ON(x1, Table) .

Generalization not soo easy.

Suppose our domain had an operator STACK-ON-B(x)

Picture:

start state goal state

C A

B A B C .

STRIPS might come up with the plan:

UNSTACK(C, B)

PUTDOWN(C)

STACK-ON-B(A)

And generalize to:

precondition: ON(x3, x2)

postcondition: ON(x1, x2)

UNSTACK(x3, x2)

PUTDOWN(x3)

STACK-ON-B(x1)

Now suppose the following problem is given:

start state: goal state:

E D A D

A C B E C B .

The generalized MACROP fits if we let x1= A, x2= C, and x3= E

We construct:

Problem: Postcondition was overgeneralized

STRIPS overcomes this by "reproving" that the general plan works (compare each operator and preconditions)

STRIPS analyzes why each step in the plan is necessary so that, for example, two identical constants can be replaced by distinct variables if doing so does not disturb the structural integrity of the plan

Thus, in contrast to inductive techniques, the planner learns from single examples.

Explanatory Schema Acquisition

In [DeJong et al. ,1986], EBL is used to learn schemata for tasks such as problem-solving and natural language understanding.

GENESIS is a schema-based natural language understanding system that learns new schema in the normal course of processing narratives.

Consider the kidnapping story (Story 1) in figure 2-1.

"John (the kidnapper) points a gun at Mary (the victim) to force her into his car so he can drive her to his hotel where he detains her. This is an important action for John; if he did not have the gun Mary would probably not have gotten into the car and his attempts at kidnapping would have failed. However, this action is not essential to all kidnappings. There are many ways John might have forced Mary into his car. He might have threatened her with a knofe or some other weapon, he might have overpowered her, or he might have drugged her. Furthermore, he might not have used his car at all; he might have simply tricked her into entering his hotel room to capture her, etc. Clearly a general kidnaping schema ought not to require a gun. The gun-pointing event should be generalized in the final schema [DeJong et al., 1986]."

"The pointing of the gun must be understood as a component of the known schema THREATENEN. Furthermore, it must be understood that John is THREATENing Mary so that she will get into his car, and he wants her in his car so he can drive her to his hotel room to lock her up. Thus pointing a gun is John's sub-sub-sub-subgoal of detaining Mary. In our system, the schema for seizing and detaining someone is called CAPTURE. In contrasting the explanation for this story the system realizes that a CAPTURE of Mary must be performed to allow a plausible BARGAIN between John and Fred in which John uses the known schema CAPTURE in a novel way to satisfy a precondition for the known schema BARGAIN [DeJong et al., 1986]."

(slides [DeJong et al., 1986, pg 169-172])

Constraint-Based Generalization

Minton (1984) describes a game-playing program that learns by analyzing why its opponent was able to force it into a trap.

Games:

Consider the chess diagram on fig. 2-2, which illustrates a simple chess combination called a 'skewer'.

The black bishop has the white king in check. After

the king moves out of check, as it must, the bishop

can take the queen.

Minton's program can learn about tactical combinations of this type, where one opponent forces the other into an outright loss, or some other undesirable state.

After falling into a trap, the program analyzes why the trap succeeded, and learns a rule that enables it to avoid the trap, or spring it on an opponent.

The analysis operates by:

EBG

The EBG method of [Mitchell et al., 1986] is perhaps the most widely known work in EBL

The main thrust of their effort was to clarify and unify much of the earlier research and approaches to EBL.

As such, EBG is mainly a framework that is not closely tied to a particular performance system.

It clarified many of the common aspects of the various systems:

The EBG Approach

The key insight behind EBG (its power) is that it is possible to form a justified generalization of a single positive training example provided the learning system is endowed with some explanatory capabilities.

In particular, the system must be able to explain to itself why the training example is an example of the concept under study.

The EBG module takes as input: (see figure 2-3, 2-4)

(descriptions)

Problem Definition Input

Learning Input In other learning techniques, we viewed the goal concept as an output, not the input. The assumption here is that the goal concept is not operational.

*EBG seeks to operationalize the goal concept by expressing it in terms the problem-solving program can understand.

The output of the EBG module is:

Note: if the operational criterion is not specified, the input goal concept definitions could always be a correct output concept definition and there would be nothing to learn.

  • The job of an EBG module is to bundle together the rules in an explanation tree so that the problem solver does not have to search through all the possible alternatives to the rules when the same explanation tree can be used to reconstruct a similar subtree in the future.

    The Problem-Solving Method

    EBG proceeds in two phases:

    A generalization of an example is a concept definition which describes a set containing that example.

    An explanation of how an instance is an example of a concept is a proof that the example satisfies the concept definition.

    An explanation structure is the proof tree, modified by replacing each instantiated rule by the associated general rule.

    Example: from [Mitchell et al, 1986]

    Goal: Learn the concept SAFE-TO-STACK(x,y)

    I.e., the set of object pairs (x,y) such that x is safe to stack on y

    Task: Which of the features of the object are relevant to characterinzing the goal concept, and which are not.

    Input: fig. 2-4

    Output:

    Note that the explanation structure has been constructed so that each of its branches terminates in an expression that satisfies operational criterion...

    Thus it singles out those features of relevance.

    Now that the relevant features are isolated, the EBG method generalizes on those feature values selected, by determining sufficient conditions on those feature values that allow each step in the explanation structure to carry through.

    It then forms the generalized tree at the bottom of the figure... by:

    Technique: (after explanation tree generation)

    Regressing (back propogating) the goal concept

    step by step back through the generalized tree.

  • In general, regressing a given formula F through a rule R is a mechanism for determining the necessary and sufficient (weakest) conditions under which that rule R can be used to infer F.**

    The goal concept is SAFE-TO-STACK(x,y), so this is unified with the root of the generalized explanation tree (figure 2)

    This process is recursively applied to the next level (figure3) ... and so on

    note - terminals in tree from earlier levels are brought forward by applying substitutions in the levels below (e.g., less(w1, w2)) Figure 4
    The final answer is the conjunction of the leaves.

    Some Pluses:

    Some Problems: (page 69 of Mitchell paper)

    EBG depends heavily on the domain theory. Given such a theory, why are examples needed?

  • to focus the learning on relevant operationalizations

    In what sense did the system learn?

    Utility Problem

    PRODIGY [Minton et al., 1989] uses EBL (and derivational analogy) to formulate search control rules that help the problem solver improve by avoiding past failures in future use.

    major contribution: identification of the utility problem

    While learned rules may reduce problem-solving time by directing the search more carefully, they may also increase problem-solving time by forcing the problem solver to consider them.

    If we want to minimize the the number of node expansions in the search space, then the more control rules we learn the better. But if we want to minimize the total CPU time required to solve a problem, we must consider this trade-off.

    PRODIGY maintains a utility measure for each control rule.

    This measure takes into account the average savings provided by the rule, the frequency of its application, and the cost of matching it.

    Empirical evidence has demonstrated the effectiveness of keeping only those control rules with high utility.

    -Analogy

    An AI program that is unabale to grasp analogy will be difficult to talk to and consequently difficult to teach.

    Consider:

    To understand, one needs to:

    1. pick out one key property of a roller coaster -- that it travels up and down rapidly
    2. realize that physical travel is itself an analogy for numerical fluctuations -- as in stock prices
    The difficulty comes in determining what things are similar and what things are not.

    Two methods of anological problem solving in AI are transfomational and derivational analogy.

    Transfomational analogy

    Like proofs in geometry. One looks for a previous theorem that is very similar and 'copies' its proof, making substitutions when necessary.

    The idea is to transform a solution to a previous problem into a solution for the current problem.

    figures 17.19, 17.20

    Derivational analogy

    Transformational analogy does not look at how the old problem was solved: it only looks at the final solution.

    Often the intricacies of solving the old problem are relevant to

    solving a new problem.

    The detailed history of a problem-solving episode is called its

    derivation.

    Analogical reasoning that takes these histories into account is called derivational analogy.