CSCI 222 Hierarchical Classifier System Documentation

 

Paul Bell

Data Structure Chooser Expert System

 

Introduction

The Data-structure Determiner System (DDS) was implemented using Visual Wave as a Hierarchical Classifier. The program is an aid for helping the software programmer decide what kind of data structure would best suit their needs. The system is by no means all-inclusive and can best be summarized as a horizontal slice of possibilities.  In implementing this version of the system (previous model was a rule based system using CLIPS) a decision to use a classifier or a design tool had to be made.  After much contemplation the classifier was used. The argument for classifying is based on the similarities to diagnosis of a disease, which has often been done using classification.  In the current situation we have some known information about our needs (i.e. structure purpose, data size, data quantity, etc.) and these can be thought of as symptoms.  Establishing certain facts about the needs and then refining the decision process achieve the diagnosis of which structure should be used.  Although this course of action was taken, a debate still exists as to whether a design tool could also have been effective for this project.

 

This report will provide a discussion of various topics intended to illustrate to the reader the current state of the system. The topics include background, description, knowledge representation and organization, control methodology, sample runs, limitations, scalability, strengths and weakness of methodology, overall goodness, and what else can be done. 

 

Background

The DDS system was implemented previously using CLIPS a rule based expert shell. In implementing that version a lot of consideration for handling the control was necessary. Prior to CLIPS version no known pre-existing system within this domain was known, and as of this writing there is still no awareness of one. For this version of DDS the Visual Wave’s classifier tool was used and simplified the handling of control issues.  The resources for using the Visual Wave classifier were minimal and provided by Anne Keuneke, Computer Science Professor at CSU Chico. It should be noted that learning to use the Visual Wave tool was a more pain staking process compared to the CLIPS shell. However, the actual programming of knowledge was easier with Visual Wave.

 

The knowledge base used by the system was the same as the CLIPs version and extracted by the programmer.  The acquisition of knowledge used by the programmer was generated from two main sources. The main source providing the backbone and major substance of the knowledge comes from the CSCI 15a, 15b, 151 “Programming Algorithms and Data Structures” classes provided at California State University Chico.  In cases were more specific materials were needed references were taken from Robert L. Kruse and Alexander J. Ryba’s text Data Structures and Program Design in C++.  Considering the horizontal nature of the project this was enough background material to accomplish the scope required for this assignment. Finally a more encompassing system would require a more expert understanding of data structures and algorithms.

 

Description

The DDS Classifier operates very similar to CLIPs version in the sense that it continually queries the user for information until it reaches its decision.  The CLIPS version of knowledge was based on a hierarchical scheme with each new node requiring new information to move to the lower level. Because of this layout of knowledge the programming of knowledge in the classifier was a relatively simple task. That is to say the very nature of a classifier is of a hierarchical structure with control already contained within the tool.  The use of structure matchers within each Specialist of the classifier removes the need of the programmer to write rules about control.

 

In the DDS Classifier each node (Specialist) in the hierarchy represents a particular type of data structure. All sub-specialists below a given specialist are a more specific type of structure that is a member of the super-specialist type.  Thus as we traverse down the structure we are pruning out knowledge which is not relevant to the current case under examination.  The current hierarchy mapped in this program contains fairly sparse knowledge and could be viewed as merely an extended structure matcher. On the hand it also provides an initial framework for development of a more robust implementation of the domain knowledge. More on these issues in the limitations and scalability sections.

 

Knowledge Representation and Organization

 

 

Above is the hierarchical structure for the DDS Classifier.  The first specialist on the right has been cropped a bit it should read “Data Structure.spec” and a couple nodes on the left the “.spec” extension cannot be viewed.  As discussed earlier it could be claimed the structure is merely an extended structure matcher. For example at the Queue.spec specialist the knowledge for that node could be further developed so a subsequent branching of the Dynamic Queue.spec would be unnecessary. If so the structure at this point would look more like the following:

 

                      -----Dynamic Array Queue.spec

                      |

Queue.spec -----Link List Queue.spec

                      |

                      -----Statuc Queue.spec

 

Adaptations of this nature could occur at a few different locations, including but not limited to the Hash.spec, Stack.spec, and Tree.spec.

 

Control Methodology

The control methodology used for the DDS Classifier is Establish and Refine via structure matching. The establish and refine method works as follows. At each node a specific knowledge is represented. The node based on its knowledge retrieves relevant data from the user and establishes the state of that node. Then by analyzing the data via the structure matcher it refines by passing control to one of its sub-nodes. The process is repeated until a final decision is met; a node is reached that has no sub-nodes.

 

Sample Run 1

The following is a set of screen shots from a successful run of the program:

 

 

 

 

 

This run has determined the user needs a Binary Search Tree (BST). The final screen the “Results List” describes the reason for its decision. In this case the user is looking for a structure to do search or retrieval, maintain a sorted order, the data entry will not be order, and the data size to be enter is large.

 

Sample Run 2

This example demonstrates a failure of the system.

 

 

 

This sample demonstrates what happens whenever the user chosen the “unknown option” on any of the specialists. The second screen shown here will appear.

 

Limitations

The main limitation of the system is illustrated in Sample Run 2. Whenever an “unknown” response is answered the system is not programmed to handle the response. The nice thing about the classifier tool is the fix would be easy to incorporate, simple adjustments in each particular specialists structured matchers would be needed. The more difficult thing is determining what the knowledge is to be for an “unknown” in any given response. Since the system is trying to find the most efficient data structure, an “unknown” response should cause the system to err on the side of caution. In other words a more complex structure, possibly with unnecessary overhead, should be selected in order to encompass all the possibilities an “unknown” introduces.

 

Scalability

Unlike the rule-based version of the DDS the scalability of the classifier version should be easy to accomplish. As noted earlier this representation of the knowledge is a horizontal slice of the domain. There are numerous other data structures available to the programmer that this system does not address. Even though this is the case the basic concepts behind those structures not represented here do exist in this system (i.e. lists, trees, hash tables, graphs). Therefore, all one needs to do is add more knowledge to existing specialists in the current classifier. This new knowledge will be able to refine to new sub-specialists. The new sub-specialists in turn will either define or contribute to defining data structures not yet represented in this system’s current implementation. Finally the introduction of this new knowledge should not in anyway affect the control of the existing knowledge currently represented.

 

Strengths and Weakness of Methodology

The strengths of the methodology a hierarchical classifier has the control built in. Unlike rule-based systems the knowledge is represented in “chunks”.  More specifically you establish some criteria and then refine further based on that criteria. The control of the system is integrated with the structure. This leads to another strength of the methodology (knowledge representation and control), scalability.  Scalability specific to this system is discussed above, but generally, adding new knowledge to a classifier tends not to affect the representation of already existing knowledge in the structure.  The catch is though to use a classifier you must be classifying because the control used for classifying is designed specifically for solving problems of a classifying nature. Whereas in a rule-based shell you could theoretically do any kind of problem solving method (classifying, designing, etc.) but the control is more independent and the programmer must plan it into the representation of knowledge.

 

This leads to the weakness of the methodology as it relates to DDS.  Choosing a data structure really is not as cut and dry as this system tries to make it. Often the choice of a data structure is based on the type of processing to be done on the data it stores. At this point we are starting to get into areas of design. As was stated in the introduction a decision between a classifier and a designer was a big challenge. There may be cases where the data structure is inappropriate for the actual process the user of a data structure needs to perform.  Thus we may need a knowledge structure that can check if the criterion of one plan meets with the needs of another plan. With this in mind what seems to be needed is a system capable of both some design and some classifying.

 

What else could be done…

Given more time, resources, and knowledge the current system would be expanded to include numerous other data structures not included in this knowledge representation. However, this would leave us a system with minimal to no practical use on it own.   As already stated a useful system would take into consideration the process a user of any given data structure is trying to perform. What we really want is a system that not only classifies what structure to use but also classifies what process to run and can design the optimal way of doing this.  Enter the blackboard system. The blackboard system is another abstraction of Expert System shells. While the classifier can be seen as an abstraction of a rule-based system, the blackboard can be seen to be a further abstraction of design and classifier systems. A system managing and scheduling the operation of many knowledge sources in order to achieve goals requiring separate and unique expertise at different stages of a problem solving process.  Ultimately if computers will be able to program themselves this sort of knowledge representation will come to pass.

 

 

Program file locataion:

The HC file for this program can be located here.