|
OntologyStream Inc Briefing & Elementary Tutorial on categoricalAbstraction (cA) August 2, 2002 |
|
Tutorial (Pages 6-21): The tutorial is on what some regard as a deconstruction of the mathematics and technology of text and data indexing. The result is deemed a type of Synthetic Perception about the invariance in data and the development of human annotation of what humans regard as meaningful. Synthetic Perception is seen as a by-pass of current data-mining and text-indexing methods
In the first step of a knowledge technology deployment one objectively evaluates proposed technology and methodology and develops an understanding of what is available in the marketplace. One must know what is available in computer systems and from innovations in various supporting communities.
A bottom line is that the human experience of knowledge be shared. At core this is the nature of human communication. A disinterested third party cannot have a vested economic interest, particularly one that is controlling of all other interests. The community at large loses. Society loses.
Only in an economically unencumbered environment can one expect that concept base indexing and the propagation of knowledge be accomplished. The compensation for knowledge sharing has to be accrued to the individuals who are sharing this knowledge.
The tutorial will provide a full disclosure of a fundamental discovery of categorical abstraction and the inventions of stratified theory in the context of knowledge technologies.
Due to the 2001 War on Terrorism, the government is prepared to
scale up IT support for homeland security.
The problem is in the capture and management of meaning and the
inability to extract meaning using purely computational means. In a knowledge war, our conventional IT
'weapons' are weak. In theory, they
could be much stronger.
One must look to natural science and shift from word-based
indexing to concept-based indexing.
Natural scientists surmise that a concept can be identified and
processed directly, and with revolutionary effects. But it is not a simple matter to bring this principle into full
application.
The situation is analogous to World War II. Scientists surmised that munitions are much
stronger if they were based on a completely different principle -- a nuclear
reaction rather than a chemical reaction.
The Manhattan Project acted on that theory. Later peaceful uses of nuclear power where developed.
We propose a new Manhattan Project for Knowledge Technology
(KT) that will give the Nation a decisive tool in the knowledge war. As the War is won, we expect to see human
productivity increase while the National leads the World in addressing the root
causes of this unfortunate war.
What we propose is a project that allows leading scientists to
concentrate on the problem and allows engineers to bring the solution to
fruition. It requires stripping away
some of the institutional impediments that have been retarding this
development.
Initial Work Outline: Any “conceptual space” associated a knowledge system can be mapped following these steps.
1) The map is produced by collecting together between 100 and 200 samples of the description of technologies, theories and frameworks (using web searches and in house materials).
2) Once this sample is available as a text collection, we write a simple parser to produce a categorical abstraction input file (datawh.txt) that is discussed in the tutorial.
3) The OSI Knowledge Operating System (OSI KOS) is used to record human perceptions about the structure in this input file and render topic maps (XML) for inspection and comment.
The mapping process, of course, has various levels of resolution. OntologyStream Inc develops a broad and definitive model of the IP (Intellectual Property) and conceptual space that surrounds” the basic sub concepts of all knowledge system technology. This work is preparatory for
1) The full mapping of all conceptual themes related to the knowledge system’s subconcepts in both the US PTO database and in the Internet public literature.
2) The production of dedicated patents covering the work of those developing the deployment plan and observing the progress in the deployment.
3) Informed decisions over technology acquisition during integration.
The BCNGroup Charter defines a science-based Intellectual Property management structure that serves the community of innovators supporting the knowledge technologies.
This management structure is designed to serve business partners as well as innovation processes, and is the basis for coordinated development of knowledge technology and knowledge science.
System description
Knowledge
Operating System (KOS) can be used to track global trends and anticipate events.
The system brings together a variety of innovative components to provide an
"intelligence-vetting engine" that globally tracks emerging events
related to any topic or event type. The
necessary condition for seeing the event is that a sensor must produce data
structure indicative of the event.
Categorical Abstraction bypasses the complexity issue in massive data
sources by replacing massive data with a very much smaller set of categories. The categories can be used to retrieve all
of the data that will produce individual categories, so we have a true
"compression" of data into information... and the compression may be
as much as a million to one. Novelty detection falls out of this process since
a single occurrence of a data element will appear as a new category if such a
data element is in fact novel. Event chemistry opens this cA substructure into
a biologically feasible information routing and retrieval architecture based on
the Prueitt tri-level architecture { memory, awareness, anticipation }. Machine inference and situational simulation
as well as sense making is enabled
Collection – KOS will automatically monitor information flow in the Internet on a 24/7 basis, looking for new items that are related to the subject and events of interest. KOS software agents look for new and novel relationships not previously anticipated and send data structures into centralized information vetting areas staffed by humans.
For example, all Internet published scientific conferences in a specific area of interest might be mined to produce trend analysis of intellectual property. Concept mining in KOS is language-independent.
Annotation - Human annotation of trends can be done in one’s native language and incorporated into a single, multi-language knowledge base. (By avoiding translation of any inputs, meaning is preserved and the significant costs and delays of translation are avoided.) Topic maps in any language are constructed using the OASIS standard.
Storage - Our collected information is stored in formats that allow an object to be directly related (“structurally coupled”) to any other object. The relating structures are independent second order control mechanisms that support query, routing, and anticipatory responses.
A tri-level architecture is used that separates: (1) known events, (2) categorical abstractions (cA), and (3) aggregations of categorical abstractions (event chemistry or eC) under the constraints of a knowledge base.
Extraction - A number of mature technologies extract information from data sources. This information is diffused into a substructure layer (cA) of information that serves the agile reconstruction of information in context (eC). The agile technology allows us to look at an object of investigation from many different perspectives.
The extraction process is represented as categorical invariance in the patterns of word expressions and processed into a central, sparse knowledge base. Information compression, routing and inference processes are achieved using a technique (I-RIB) that renders visually as graph structures expressing structural relationship and in auditory form.
Interaction with analysts - A number of analytical tools automatically develop ontologies and taxonomies. KOS logical and structural relationships are rendered visually or optionally with sounds. The overall structure of a given area of knowledge is rendered visually. Various scenarios are generated and humans are assisted in exploring variations in strategies and in storing work product related to these investigations. The system itself can be used to analyze analyst performance.
Simulation and sense making – In KOS, the cognitive load is allocated to the human. The conventional approach, of formal deduction or computational expert systems, is replaced with an extension of quasi-axiomatic theory coupled with human experience within the KOS visual and auditory environment and within the community of KOS analysts.
What-if scenarios and conjectures about causes take a new form where humans in small communities develop meaning in dialog about KOS-defined event structures.
Display – KOS provides breakthrough capabilities for identifying and monitoring early indicators of significant change. Visualization of the categories of data and relationship among these data enhance the ability of analysts to anticipate emerging events.
The remainder of this briefing will provide a precise discussion of our methodology, including the OSI categoricalAbstraction (cA) engine and event Chemistry (eC) engines. We start with a download of software at
www.ontologystream.com/cA/tutorials/download/Elementary.zip
which is 298K in a WinZip file and 1 Meg when unzipped.

Figure 1: The unzipped file for this tutorial
The tutorial Data folder starts only with a small file called datawh.txt containing the following text:
IPaddress eventType
109 E1
110 E1
101 E2
101 E3
109 E4
105 E3
105 E2
106 E3
107 E3
102 E5
103 E6
104 E5
108 E1
109 E3
We will not use the Splitter.exe in this tutorial.

Figure 2: The SLIP conjecture with atoms = IPaddress
Open SLIPWH.exe and type in “a = 0” and then “b = 1” in the command line to produce the link conjecture seen in Figure 2. There are spaces around the “=” sign in these commands. Type “Help” to see the list of knowledge operating system (KOS) verbs available from here. Call 703-981-2676 to ask a question.
Now issue the KOS commands “pull” and “export”. Again, the command “Help” will give more information about the KOS commands.
Close the SLIPWH.exe and look into the Data folder. You will find:
You should open each one and see if you understand how each was created and what each of these data structures may be used for.
Links.txt contains
101 E2 E3
102 E5
103 E6
104 E5
105 E2 E3
106 E3
107 E3
108 E1
109 E1 E3 E4
110 E1
The first line in
Links.txt indicates that the address 101 has been associated with E2 and E3 by the SLIP conjecture.
It is important to note that two things have happened up to this point. First some process created the original datawh.txt. The second thing that has happened is that the SLIP conjecture is used as a conjectural convolution over this datawh.txt file. There are many ways to create the datawh.txt files and many ways to create conjectural convolutions. But we are taking a simple example. The process that produces the datawh.txt file is completely up to the user of the OSI browsers. However, at this point the OSI software only offers the simplest of the convolutions based on simple link analysis.
Paired.txt contains:
101**105
101**105
101**106
101**107
101**109
102**104
105**106
105**107
105**109
106**107
106**109
107**109
108**109
108**110
109**110
These are pairs of elements from the column that was selected to produce atoms. The nine unique elements that are either first or second position in these pairs is the set:
{ 101, 102, 104, 105, 106, 106, 108, 109, 110 }
103 is not in this list because E6 occurs only once in the eventType column and so there is never a fulfillment of the SLIP conjecture that involves 103. The generalization to the class of convolution operators might be anticipated by how the SLIP conjecture produces atoms based on finding all instances where the conjecture holds true.
One way to generalize this is to use Frank Schank type frames with slots and fillers. Atoms are then things that are found to occupy slots.
Now open the browser SLIPCore.exe and issue the KOS commands “import” and “extract” to produce the screen in Figure 3a. Click once (not twice) on the A1 node to produce the scattering of the atoms.

a b
Figure 3. The Core Browser (a) before clicking on the A1 node and (b) after
The nine atoms in the set of “all” atoms is scattered onto the surface of a two dimensional sphere. (This is generalized to allow “dimensions of information processing” to n dimensional spheres.)
The clustering on the sphere is similar to a Kohonen feature extraction algorithm (artificial neural network) but is simpler and may be more powerful. The clustering algorithm is a form of scatter-gather techniques that have been used in the text understanding research field for perhaps two decades. However, the way that the scatter-gather is done is unique to Prueitt’s work and is simpler and more powerful.
The technique itself has one additional innovation based on the conversion of the ASCII into a line in number base 64. This conversion allows one to very quickly answer a set membership question. This question is about whether or not a pair randomly selected from the set of all possible pairs of atoms is in the set of pairs in Pairs.txt. There are 9 atoms in this case, so there are a possible 81 pairs that can be produced.
The Pairs.txt file has 15 members. In this case the set membership problem is easy. A random pair has a probability of 15/81 of being in the Pair.txt file.
However we have data sets that contain 10,000 members. In this data the set of possible pairs is greater than 10,000,000. In this data set the atoms are ASCII stings that have between 10 and 40 characters. This means that the membership question requires that a string match be made between a text string of between 20 and 80 characters and a string of ASCII that is 10,000*80 = 800,000 characters long.
The critical computational requirement is this membership question, and database solutions are far to slow. We need to resolve 10 million membership questions to complete the average scatter gather task.
This membership question can be addressed in many ways, but perhaps the most optimal way is to pad out each of the 10,000 members to have a length equal to the maximum length over the entire set, say 80 characters and map this to memory.
One needs less than one Meg of memory to accomplish this task. Then the set membership question can be answered in less than 20 fetches (the contents of one of the 80 character data “cell”), compare (the contents of the cell to the pair that one is asking the membership question of.) steps! Why? (The answer is not proprietary; it is just best to figure this out for oneself. Hint: 2^20 > 800,000)
The scatter-gather requires that this single membership question be asked and answered perhaps 10,000,000 times. This required about four hours in FoxPro in our early studies. However, when we used the In-Memory map we find that this take less than 30 seconds.
It has been a few months since Prueitt saw this resolution of the membership question, and he has developed a few other approaches to this, but each of these other approaches are slower than the algorithm suggested above.

Figure 4: Creating subcategories
Moving on to the clustered data.
By clustering the data in the tutorial set we find that there are three clusters. Type in “random” in the command line of the Core Browser, and then type in “cluster” to see the clusters form. The three clusters can be moved into subgroups by using a command
“a, b -> categoryName”
as done is Figure 4.
See the text in the Command Line in Figure 4, and perhaps type Help into the command line.
The clustering process in the scatter gather is fast and finds a unique decomposition of the elements from the bag of atoms into a set of categorical primitives, such as seen in the next figure where the size of the set of atoms is 2395.

Figure 5: From http://www.ontologystream.com/cA/papers/EventDetection.htm
However, in our case we have only three categorical primitives (called stratified primes).

Figure 6: We have 6 simple compounds each developed by a single link and 9 atoms
These three primes are composed of five simple compounds (each
defined by one link) and the nine atoms.
On can see this by double clicking on the A1 node to produce the screen
in Figure 6 (the KOS browsers have four re-sizable windows and you can readjust
your sizes.
Then close the browser and the software remember the previous
settings.)



Figure 7: The five simple compounds
Now we can alter the datawh.txt file so that one gains a better intuition about what is possible with cA and eventChemistry.
Close all of the KOS browsers. And put the folder (Figure 1) on the desktop. Make a copy and name it what ever you wish. Having both on the desktop and no programs running is helpful. Now go into the Data fold of the copy and delete all of the files and the folders in the data folder except the datawh.txt file. Open this file up and add a new line:
105 E4
Be sure to put one tab between “105” and “E4”. Now repeat the steps to produce a set of atoms. You will get the new categorical Abstraction seen in Figure 8:

Figure 8: The categorization with the new data element added to datawh.txt
The new compounds are given in Figure 9.

Figure 9: The updated compounds
Now make a third copy of the folder and delete all of the data except of the datawh.txt file again. Take all but the first line and copy in several times into the same file (not introducing blank lines.) Now redevelop the cA and eventChemistry. You will find that the Core Browser will have the exact same data.

Figure 10: The cA and eventChemistry after adding duplicate data
This is a property of the fact that all categories are already present before the additional data is added. The additional data is not introducing any new categories. One can compare Figure 10 to Figure 6 and see that everything is the same.
Root_KOS loads and saves tabular files to/from memory, and responds to
scripted commands.
Loading and saving tabular files is important because SLIP Browsers
work with data that is in a system file or in a special In-Memory Referential
Information Base (I-RIB) data structure.
The SLIP Warehouse and SLIP Technology Browsers transform data that
resides on text files. (See Figure 1).
Both depend on a Root_KOS kernel.

Figure 11: KOS Browsers work together
The Warehouse Browser takes any tab delineated audit log and
quickly produces a data mart and a combinatoric expression of link analysis
within the values of an event log. Both
of these files are produced using In-Memory Referential Information Base
techniques. Once the files are produced
then the Technology Browser can use them.
The Technology Browser uses In-Memory RIB techniques to develop
event detection. The Event Browser
(currently underdevelopment) is launched from the Technology Browser and some
data resources are exchanged. SLIP
atoms are identified in the Technology Browser and those atoms involved in an
event are conveyed to the Event Browser.
The I-RIB uses system files, memory maps and mathematical objects such
as lines and arrays. OSI’s I-RIB
technology is derivative of mathematical formalism called structural
holonomy.

Figure 12: Some early event chemistry drawings
Structural holonomy is the innovation of Paul Prueitt and is based on
his interpretation of the work of renowned neuroscientist Karl Pribram. This formalism is of the nature of
mathematics and yet has grounding in experimental literatures from biology and
neuroscience. The basic work has been
privately peer reviewed from over a decade.

Figure 13: One
of the early event compounds
The underlying concepts of structural holonomy is being made public
domain in order to facilitate the development of I-RIB systems in support of
knowledge technologies that depend on very fast data aggregation methods.

Figure 14: The Root_KOS User Interface and data structures
OSI’s I-RIB technology is a full data base management system that is
designed to minimally cover the requirements of many classes of data
aggregation, artificial neural network, evolutionary programming and data
warehousing systems. The smallest
version of the I-RIB technology is part of the 172K SLIP Warehouse and the 348K
SLIP Technology Browser. Both of these
systems are operational.
The Root_KOS is marketing as a development environment, complete with a
minimal I-RIB system. Yearly site
license is $19,500 for the development environment. Root_KOS is applicable to elementary research in Natural Language
Processing, full text mining and warehouses, and certain types of eBusiness
functions. The development of
deployable applications are streamlined.
Any KOS Browser can recognize scripted commands by (1) command line
input (2) certain mouse events on nodes of arcs in the graph window, or (3)
mouse events on the node tabs at the top of the graph window. The exit and help
functions are the most basic of all scripted commands.
Scripting provides for user gestures as response to the presentation of
state information from an internal ontology or finite state machine. Algorithmic responses to a human gesture
produce changes in
1)
The ontology or
finite state machine,
2)
Changes in
auxiliary resources such as system files and
3)
A textual
response in the response line and/or a sound response.
KOS Browsers have:
A.1:
Modal transforms that switch the view in display areas and change the
system’s response to script commands
A.2:
Graphs that are generated from a process model. The elements of the graph are active
objects.
A.3:
Allow the selection and movement of objects within the graph.
A.4:
Selection and movement will change the state of corresponding ontologies
or finite state machines.
A.5:
Selected objects will refer to contents. So for example, selecting a node and typing “Open Event Browser”
(or “oeb”) will take the contents of the node and move this content to a new
instance of the Event Browser.
Selecting a node and typing cluster” (or “c”) will cluster the atoms
corresponding to that node.
A.6: Grammatical interface composed of a command/response line with history.
KOS Browsers have a formal Program Interface (PI) that is completely
implemented through a categorized message stream interface. The machinery for the PI is complete in the
Root_KOS and extendable in any KOS Browser.
Project plumbing in the root KOS
provides maximal notational affordance by using extensible graph based
formalism.
Notation on states and gestures extend the KOS foundational notion: The notation for states, gestures and locations is:
The
set of world states S = { sj }.
The
set of gestures G = { gi }.
The
set of locations L = { Lk
}.
A formal theory of states and
gestures provide a computational means to discuss event paths in human computer
interactions, particularly for telelearning, distance learning, and virtual knowledge
collaboration. These states can be used as part of the mechanics of what we
call a "knowledge processor".
The notion of location allows for
the development of enterprise KOS.
The formal theory of states and
gestures is built from a precise formalization of computer game and user
interactions in Multiple User Domains (MUDs) (such as the Palace 2-D chat
environment) and Role-Playing Games (RPG) (such as ChronoTrigger or
Zelda.)
Gestures and states can be modeled
as the fundamental units of a specific finite state machine. Thus, under any
such designation, the related model of event sequences is computable. The KOS, however, must represent the
invariance in the data AND the mental states of humans. This means that our effort to representation
the contents of mental events within the context of discourse, should be split
into two parts. These two parts are
1)
The
aggregation of data invariance and
2)
The
representation of user response history in terms of the set of aggregated
invariance.
The aggregation of invariance is
used to represent the computer processes, and the user response history is used
to represent the natural world. The
communication between finite state machines is used to control and allow user
(community) focus on a specific set of issues.
This control of focus provides universal “Informational Transparency
with Selective Attention”.
We can assume that a set of states, S = { sj },
are created before hand and the gestures G = { gi } are a set of
stored responses. This would mean that S and G are actually the atoms of finite
state machine. However, the existence of event states may be actually implicit
unless actually instantiated as part of an event. In this case, it is proper to
refer to S and G as virtual finite state machines. In the course of working with the KOS Browsers, users select
gestures in response to a presented state from the computer. This pairing of
state and gesture is a represented as a mutual decision event.
In the Event Chemistry being developed by OSI, the finite
state machine is a virtual combinatoric expression of the aggregation of
abstractions called “SLIP atoms”. This
abstraction plays a role similar to the abstractions we have in physical
chemistry. Each atom has a specific set
of linkage potential. The manipulation
of the linkage potential (called “valence: in physical chemistry) leads to the
formation of chemical compounds.
The notions of an “atom” and “chemical valence” are useful
abstractions. In the Event Browser we
use a different set of abstractions to visualize the parts of events and the
means that these parts have to be a compound in a larger event.

Figure 15: event
chemistry
An abstraction from the signatures of real occurrences leads
to event chemistry. We have developed an additional
capability. This capacity is to provide a unique situational analysis based on
the relationship of part to whole as revealed in the formal literature of
Russian applied semiotics and related literature in the West.
Definition: The term formal system refers to a four-term expression:
M = < T, P, A, R>
where T = set of basic elements, P
= syntactic rules, A = set of axioms, and R = semantic rules. The interested reader can refer to Situational
Control for a more detailed treatment of formal systems.
For our purposes we need only
refer to a figure from page 37.

Figure 16: Taken from Pospelov Situational
Control
In figure 16, the set of base
elements T are combined in various ways to produce three types of sets, axioms,
semantically correct aggregates, and syntactically correct aggregates. We
should remember that mathematical logic is founded on a similar construction
and therefore that most of the results of mathematical logic will somehow apply
later on to the theory of knowledge processing that we are constructing. For
example, the set of axioms can be specified to consist of independent, non
contradictory and self evident statements about the set of base elements T.
The axioms are our analog to the
quantum mechanical substrate that gives rise to the mechanisms that in turn
produce some of the material and energetic constraints that are involved in the
creation of awareness. Rules of inference can also be formulated to maintain
notions about true or false inference from assignments of a measure of truth to
the axioms.
The set of syntactically correct
aggregations of elements of base elements can be defined either by listing or
by some implicit set of rules.
The semantically correct
aggregations could then be interpreted as those syntactically correct
aggregations that have an assignment of true as a consequence of the inference
rules.
BCNGroup Charter
First
published on February 5, 1997
1) Mission Statement
Our
mission is to create an international communications forum for the
understanding, discovery and dissemination of knowledge for the common good.
To
create new technologies that synergistically applies computational systems to
the automated investigation of artificial and natural systems.
To
provide protection for knowledge wealth created by primary scientific
investigation.
To
hold in trust intellectual property for the common benefit of BCNGroup members
and third party developers.
2) Motivation
The
mechanisms currently available to fund basic science are excessively slow and
are wholly inadequate to address the complex time-critical problems we are
likely to face in the 21st century.
This
inadequacy arises from an enigma. On the one hand, the very notion of treating
scientific knowledge as property is not compatible with the spirit of
scientific investigation.
On
the other hand, international intellectual property rights are essential to the
free market development of technology.
Without
such protection, intellectual property may be exploited without due
compensation or acknowledgment to its author.
3) Knowledge Capital Formation
The
BCNGroup intends to generate and manage wealth to the greatest extent possible
given the above constraints.
This
wealth will consist of two distinct types of knowledge capital:
(i)
Public
knowledge wealth and
(ii)
The BCN inventory of intellectual capital.
3a)
Public Knowledge Wealth:
Public
wealth is created by placing the results of investigations of contributing
scientists into a knowledge base. Practitioners may apply this knowledge in
whatever manner they deem appropriate. Such knowledge will be secured for the
purpose of insuring complete public access to basic discoveries..
Common
law states that when results from a scientific investigation are publicly
disclosed, previous to filing a government application to recognize private
ownership of the intellectual property, the results are forever made part of
the public domain. In this case, no one has legal authority to restrict the use
of this property and the application for private ownership should be denied. In
simple terms this means that if knowledge is published at a time when no legal
protection exists for individual ownership of that knowledge, then that
knowledge is owned in common by the public body.
This
principle in common law is being constantly eroded. To insure that this common
law principle is not further degraded, the BCNGroup will follow each BCNGroup
public disclosure of new scientific knowledge with specific procedures that are
designed to clarify what are allowable targets for the creation of private
intellectual property.
The
BCNGroup will help identify well-defined access points for third party
development of new technologies. A consortium relationship will be maintained
that is designed to transfer scientific work into technology, but with
reasonable attribution to the origins of the scientific work.
3b)
The BCN Inventory of Intellectual Capital:
The
BCNGroup will establish a new independent science funding process that will
generate wealth by entering into agreements with its member scientists and
third party developers. In this way,
BCNGroup scientists will work with entrepreneurs in migrating theories from a
scientific investigation (proof of principle) phase to a development (proof of
product, or prototyping) phase.
The resulting financial wealth will provide an increasing flow of capital to the science community and will be held in trust by the BCNGroup. The Scientific Council of the BCNGroup will manage the redistribution of this wealth in part by contributing to the health of the science community in the form of yearly prizes for