CSCI 222 Hierarchical Classifier System Documentation
Paul Bell
Data
Structure Chooser Expert System
Introduction
The Data-structure Determiner System (DDS) was implemented using Visual Wave as a Hierarchical Classifier. The program is an aid for helping the software programmer decide what kind of data structure would best suit their needs. The system is by no means all-inclusive and can best be summarized as a horizontal slice of possibilities. In implementing this version of the system (previous model was a rule based system using CLIPS) a decision to use a classifier or a design tool had to be made. After much contemplation the classifier was used. The argument for classifying is based on the similarities to diagnosis of a disease, which has often been done using classification. In the current situation we have some known information about our needs (i.e. structure purpose, data size, data quantity, etc.) and these can be thought of as symptoms. Establishing certain facts about the needs and then refining the decision process achieve the diagnosis of which structure should be used. Although this course of action was taken, a debate still exists as to whether a design tool could also have been effective for this project.
This report will provide a discussion of various topics intended to illustrate to the reader the current state of the system. The topics include background, description, knowledge representation and organization, control methodology, sample runs, limitations, scalability, strengths and weakness of methodology, overall goodness, and what else can be done.
Background
The
DDS system was implemented previously using CLIPS a rule based expert shell. In
implementing that version a lot of consideration for handling the control was
necessary. Prior to CLIPS version no known pre-existing system within this
domain was known, and as of this writing there is still no awareness of one.
For this version of DDS the Visual Wave’s classifier tool was used and
simplified the handling of control issues.
The resources for using the Visual Wave classifier were minimal and
provided by Anne Keuneke, Computer Science Professor at CSU Chico. It should be
noted that learning to use the Visual Wave tool was a more pain staking process
compared to the CLIPS shell. However, the actual programming of knowledge was
easier with Visual Wave.
The
knowledge base used by the system was the same as the CLIPs version and
extracted by the programmer. The
acquisition of knowledge used by the programmer was generated from two main
sources. The main source providing the backbone and major substance of the
knowledge comes from the CSCI 15a, 15b, 151 “Programming Algorithms and Data
Structures” classes provided at California State University Chico. In cases were more specific materials were
needed references were taken from Robert L. Kruse and Alexander J. Ryba’s text Data
Structures and Program Design in C++.
Considering the horizontal nature of the project this was enough
background material to accomplish the scope required for this assignment.
Finally a more encompassing system would require a more expert understanding of
data structures and algorithms.
Description
The
DDS Classifier operates very similar to CLIPs version in the sense that it
continually queries the user for information until it reaches its
decision. The CLIPS version of
knowledge was based on a hierarchical scheme with each new node requiring new
information to move to the lower level. Because of this layout of knowledge the
programming of knowledge in the classifier was a relatively simple task. That
is to say the very nature of a classifier is of a hierarchical structure with
control already contained within the tool.
The use of structure matchers within each Specialist of the classifier
removes the need of the programmer to write rules about control.
In
the DDS Classifier each node (Specialist) in the hierarchy represents a
particular type of data structure. All sub-specialists below a given specialist
are a more specific type of structure that is a member of the super-specialist
type. Thus as we traverse down the
structure we are pruning out knowledge which is not relevant to the current
case under examination. The current
hierarchy mapped in this program contains fairly sparse knowledge and could be
viewed as merely an extended structure matcher. On the hand it also provides an
initial framework for development of a more robust implementation of the domain
knowledge. More on these issues in the limitations and scalability sections.
Knowledge Representation and
Organization

Above
is the hierarchical structure for the DDS Classifier. The first specialist on the right has been cropped a bit it
should read “Data Structure.spec” and a couple nodes on the left the “.spec”
extension cannot be viewed. As
discussed earlier it could be claimed the structure is merely an extended
structure matcher. For example at the Queue.spec specialist the knowledge for
that node could be further developed so a subsequent branching of the Dynamic
Queue.spec would be unnecessary. If so the structure at this point would look
more like the following:
-----Dynamic Array
Queue.spec
|
Queue.spec
-----Link List Queue.spec
|
-----Statuc Queue.spec
Adaptations
of this nature could occur at a few different locations, including but not
limited to the Hash.spec, Stack.spec, and Tree.spec.
Control Methodology
The control methodology used for the DDS Classifier is Establish and Refine via structure matching. The establish and refine method works as follows. At each node a specific knowledge is represented. The node based on its knowledge retrieves relevant data from the user and establishes the state of that node. Then by analyzing the data via the structure matcher it refines by passing control to one of its sub-nodes. The process is repeated until a final decision is met; a node is reached that has no sub-nodes.
Sample Run 1
The following is a set of screen shots from a successful run of the program:




This
run has determined the user needs a Binary Search Tree (BST). The final screen
the “Results List” describes the reason for its decision. In this case the user
is looking for a structure to do search or retrieval, maintain a sorted order,
the data entry will not be order, and the data size to be enter is large.
Sample Run 2
This example demonstrates a failure of the system.


This
sample demonstrates what happens whenever the user chosen the “unknown option”
on any of the specialists. The second screen shown here will appear.
Limitations
The main limitation of the system is illustrated in Sample Run 2. Whenever an “unknown” response is answered the system is not programmed to handle the response. The nice thing about the classifier tool is the fix would be easy to incorporate, simple adjustments in each particular specialists structured matchers would be needed. The more difficult thing is determining what the knowledge is to be for an “unknown” in any given response. Since the system is trying to find the most efficient data structure, an “unknown” response should cause the system to err on the side of caution. In other words a more complex structure, possibly with unnecessary overhead, should be selected in order to encompass all the possibilities an “unknown” introduces.
Scalability
Unlike the rule-based version of the DDS the scalability of the classifier version should be easy to accomplish. As noted earlier this representation of the knowledge is a horizontal slice of the domain. There are numerous other data structures available to the programmer that this system does not address. Even though this is the case the basic concepts behind those structures not represented here do exist in this system (i.e. lists, trees, hash tables, graphs). Therefore, all one needs to do is add more knowledge to existing specialists in the current classifier. This new knowledge will be able to refine to new sub-specialists. The new sub-specialists in turn will either define or contribute to defining data structures not yet represented in this system’s current implementation. Finally the introduction of this new knowledge should not in anyway affect the control of the existing knowledge currently represented.
Strengths and Weakness of
Methodology
The strengths of the methodology a hierarchical classifier has the control built in. Unlike rule-based systems the knowledge is represented in “chunks”. More specifically you establish some criteria and then refine further based on that criteria. The control of the system is integrated with the structure. This leads to another strength of the methodology (knowledge representation and control), scalability. Scalability specific to this system is discussed above, but generally, adding new knowledge to a classifier tends not to affect the representation of already existing knowledge in the structure. The catch is though to use a classifier you must be classifying because the control used for classifying is designed specifically for solving problems of a classifying nature. Whereas in a rule-based shell you could theoretically do any kind of problem solving method (classifying, designing, etc.) but the control is more independent and the programmer must plan it into the representation of knowledge.
This leads to the weakness of the methodology as it relates to DDS. Choosing a data structure really is not as cut and dry as this system tries to make it. Often the choice of a data structure is based on the type of processing to be done on the data it stores. At this point we are starting to get into areas of design. As was stated in the introduction a decision between a classifier and a designer was a big challenge. There may be cases where the data structure is inappropriate for the actual process the user of a data structure needs to perform. Thus we may need a knowledge structure that can check if the criterion of one plan meets with the needs of another plan. With this in mind what seems to be needed is a system capable of both some design and some classifying.
What else could be done…
Given
more time, resources, and knowledge the current system would be expanded to
include numerous other data structures not included in this knowledge
representation. However, this would leave us a system with minimal to no
practical use on it own. As already
stated a useful system would take into consideration the process a user of any
given data structure is trying to perform. What we really want is a system that
not only classifies what structure to use but also classifies what process to
run and can design the optimal way of doing this. Enter the blackboard system. The blackboard system is another
abstraction of Expert System shells. While the classifier can be seen as an
abstraction of a rule-based system, the blackboard can be seen to be a further
abstraction of design and classifier systems. A system managing and scheduling
the operation of many knowledge sources in order to achieve goals requiring
separate and unique expertise at different stages of a problem solving
process. Ultimately if computers will
be able to program themselves this sort of knowledge representation will come
to pass.
Program file locataion:
The HC file for this program can be located here.