Section 3 (Chapter 5 in [1] )

 

Associative memories between theme space and semantic space

 

 

Computational association between discrete and continuous mathematical spaces S1 and S2 can be used to define transformations between partitions of representational spaces.  If we notate a transformation between the two spaces as “T” then we have the mathematically defined relationship that for each a, an element of, S1, then there exists an element b in the space S2 such at

 

T(a) = b

 

The partitions are needed in order to create a system of formal containers that assist humans in performing agile decompositions of informational organizations into areas where rules of transformation can usefully vary. 

 

The partitions and transformations are needed to reflect a diversity of human viewpoint and the relative nature of information to varying contexts.  Over time, one expects to map out similarities and differences between these partitions.

 

Shifts in viewpoints change the organization of information.  The existence of continuum mathematics based transformations, over how information is organized is critically important in moving from one human viewpoint to another.  The roles of the abstractions, that allow humans to agree over the nature of these transformations, are essential in ways that our work attempts to explain.

 

One issue to consider as the reader tackles this new work is the unique location property (ulp).  Computers can depend on the ulp, when it is provably present; in order to decrease the number of machine cycles required for fetching data.  In the continuum, one can always insert, delete or modify the localized information between two other “pieces” of localized information.  This update does not change the ulp over the data!  The update takes one simple step, which is to make the update occur as part of the abstraction.  The virtual model can also be expressed into computer memory in such a fashion that the ulp is preserved.  This requires a complete rewriting of all data in one pass. 

 

Let us look at ulp again.  A data base record is one example of localized information.  A block of records that have been written to computer memory cannot be updated in one machine cycle in such a fashion that preserves the unique location property over the discrete structure in memory.  However, a virtual model of information organization can exist with the ulp in the continuum mathematics due to the properties of the real numbers.  In the continuum mathematics the updates are essentially one step and thus can be achieved in one or only a very few machine cycles.  Once an update to the abstraction has been made the abstraction can be used to write out to computer memory a contiguous record that does have a physically rendered ulp.  Computer memory with ulp has special properties that allow parallel and serial processes to act in the most efficient way possible.  The term “structural holonomy” has been developed, elsewhere, to refer to these contiguously written records.

 


The association between vectors and semantic primitives

 

Our purpose in this section is to provide a strategy that uses several knowledge technologies to develop a representation about a specific object of investigation.  The object of investigation can be the structure of how computer data is organized.  But the object of investigation may also be a complex natural phenomenon that cannot be perfectly modeled in either the continuum mathematics or the discrete formalism.  The phenomenon may be, for example, consequent to a social or psychological system.

 

Any one of several different types of associative neural networks can easily be used to define transformations between types of representations when representations exist in a mathematical formalism.  Simple bi-associative memories can learn specific mappings between different knowledge spaces and between knowledge spaces and themespaces. Our strategy is to utilize multiple types of representations and then to create an associative memory between the representational spaces. 

 

Consider two major types of representation.

 

o        Theme / keyword vector representation

o        Semantic primitive representation

 

The two types of representation are very different in nature and formal notation.   Theme vector representation is well developed in techniques like latent semantic indexing and self-organizing feature maps.  Differential ontology creates a mapping between the continuum mathematics and discrete encoding structures.  In fact, the first application of differential ontology was in converting the consequences of LSI placement of text into the categories defined by LSI transforms into ( type : value ) pairs (Prueitt 2003).

 

The semantic primitive representation depends on a theory of type that has been worked out by a line of scholarship that includes the works of C. S. Peirce (1839-1914), the Russian work described in Chapters 1 and 2, and the more recent work by John Sowa and Richard Ballard.  Ballard’s work makes clear the use of a Zackman-type framework to create language neutral informational codes that reflect a specific theory of type.  The new work on generalFramework theory, introduced by Prueitt 2001, can be used to encode knowledge experiences directly into a framework.  The techniques of differential ontology can be used to convert explicit information developed from the use of these primitives into a Hilbert space representation.

 

 

Figure 1 : The use of associative memories to link themespace and concept space (Prueitt, 1998)

 

Before developing some additional notation, we make the observation that knowledge technology acknowledges the importance of a direct awareness about the structure of events occurring in the world.  This structure is not random and is specific.  So two, or more, good methods for discovering knowledge about the structure of events will be deriving representations that have some type of categorical linkage.  If the methods are good, then the variation related to viewpoint is expected but some important essence of the events will be preserved in each viewpoint. 

 

The computer encoding of knowledge, however, creates glass ceiling and some illusions.  Differential ontology maps computer encoding with the aid of continuum mathematics and therefore raises the glass ceiling that one must expect from any type of formalism.  But the ceiling remains.

 

In the knowledge sciences, the computer program is always regarded as an abstraction, and thus not something that is in the same category as natural phenomenon.  Programs and computer states organized by programs are like counting numbers in that these abstractions do not have a location or any type of boundary conditions.  Abstractions are not “physically” real. 

 

Instantiating abstraction as a computer state does cause something to exist as a physical state.  However, that computer’s physical state is highly constrained to reflect a specific type of discrete formalism.  Thus the computer states share something in common with the discrete formalisms.  Zeno’s paradox, the Russell paradox and other artifacts of classical logic and mathematics express foundational limitations.  These observations set the stage for the knowledge technologies.

 


Theme space: Let C be a collection of text documents separated into text units.

 

For each text unit dk in C, a set of phrases

 

{ pk,1 , pk,2 , . . . , pk,h }

 

can be identified as a representation of the semantic content of the text unit. The parameter h depends on the representational procedures and on the text unit.

 

Let A = È { pk,1 , pk,2 , . . . ,.pk,h }

 

be the union of phrases from each text unit in the collection C.

 

Given a narrow domain for the collection C , the size of A is weakly convergent and A is an open set,

 

A = { q1 , q2 , . . . , qn(t) } .

 

By “weakly convergent” we mean that new phrases may be occasionally added or removed.  The domain might be thought of as a universe of discourse and the collection C a sample of text expression of this universe of discourse.  The weakly convergent properly is therefore important to fully and minimally model the thematic content of the universe of human discourse as expressed over time. 

 

Let S be the vector space where each qi is assigned a distinct dimension.

 

By using the interval [0,1] at each dimension, then S = [0,1]n(t) .

 

This assignment will impose an unique location property to the set

 

{ q1 , q2 , . . . , qn(t) }.

 

Each qi is assigned a unique location in the space S. Again, the integer n(t) is an integer valued variable whose value depends on the situation.

 


Semantic prime space:  Now let { Di } be a set of knowledge domains and Di be one of these domains.  As before the domain can be thought of as a sampling from some active universe of human discourse. 

 

Suppose that Di can be represented by a set of syntagmatic units, each in the form of an ordered triple < c1, r ,c2 > where c1 and c2 are concept symbols and r is a relationship symbol.   Suppose further that the relational symbol is specified using only the set of semantic primitives discussed by Ballard or Sowa.  Allow the set of concept symbols to be enumerated by a human, or human community, in the form of a controlled vocabulary. This controlled vocabulary would then be part of a simple graph construct. 

 

Suppose, further, that { Di } is the minimal set of domains required to describe the semantic interpretations of the collection of text documents C.  We assume that a software interface exists that allows two activities.  So suppose that the first is the development of a community vetted controlled vocabulary.  The second activity is needed that allows a mark-up of the “primitive relationships” between elements of the controlled vocabulary in context.  This means that a human is aware of a context and, consistent with this awareness, the human makes an annotation about relationships of the types specified by a specific set of semantic primitives.  Community agreement, on the set of semantic primitives, is like negotiated agreements on controlled vocabularies, except that the set of primitives is small and reflects a mature theory of how phenomenon is organized.  The set of all details of Newtonian mechanics is in this sense a community-vetted theory that is derived from a small set of primitives. 

 

 

Figure 2:  A Knowledge base Framework

 

Using a general framework construct, such as a Zackman framework or a Ballard framework, we define a conceptual syntagmatic unit as an ordered triple where elements from a controlled vocabulary are annotated pairwise to have semantic relationships expressed in some aggregation of the semantic primitives.  The generalFramework theory uses a framework such as the one in Figure 2 to define a specific number of cells that are the types of meaning that can occur, either by itself or in combination with other types of meaning.  In the Figure 2 we have the 18 cells related to Ballard’s framework.

 

In the theory of categoricalAbstraction (cA) and eventChemistry (eC) we have fillers as potential atoms of event compounds, slots serve to provide the binding of atoms into the event compound and the script (or framework) is in fact the relationship between atoms.

 

The generalFramework (gF) discrete data encoding has the form of an n tuple:

 

< event, a(1), a(2), . .  . , a(n) >

 

The n-tuple has n atoms and one relationship. 

 

 

Figure 3: The process flow model of human memory formation, storage and use (Prueitt, 1996)

 

Now suppose that we have looked at a number of relationships to produce resources as in Figure 3.

 

An enumeration of conceptual syntagmatic units is achieved by human use of an computer interface that presents instances of situations within a class of events.  If { Di } is a set of knowledge domains, then the relationship between the domains and the instances of event is determined, as best one can, by some process. 

 

There are several strategies available to us.  We have some degree of flexibility over how the formalism is established.  Once formalism is established we engage in the Actionable Intelligence Process Model discussed earlier.  So the tuning of the informational organization is left in the hands of humans. 

 

The most attractive formalism is to treat each pair in the controlled vocabulary as an event, and use the Ballard framework to annotate the set of all possible relationship types between these two vocabulary elements.  Each of these instances is marked-up by indicating the negative presence (blocker) or presence of just those semantic primitives that appear to the humans as being relevant.  The blockers are annotated as a negative 1 and the presence is annotated as a positive 1 placed into the semantic primitives cell.

 

All cells of the framework do not have to be filled for each pair of elements from a controlled vocabulary.  A human need not manually identify all pairs that might usefully be considered.  Automated means exists.  For example linguistic variation can be identified using LSI correlation, for example.

 

Let

 

O = { < c1, r, c2 > }

 

be the union of the sets of conceptual syntagmatic units from { Di }.  One way to read the triple < c1, r, c2 > is as a rule.  The rule is that c(1) is related to c(2) through the semantic primitive r.  O is a derived minimal ontology for { Di }.  

 

The construct O is derived within an investigation of an phenomenon or the structure of information within a computer-based information system.  It is perhaps important to note that a minimalism can occur where the construct O is left to the interpretation of human or human communities.  Thus a great deal of common sense detail does not need to be developed into the construct.  The trade off is a high degree of agility in how the elements of the construct are rendered and presented to the human or human community. 

 

If there is a well defined set of semantic rules Q for structures composed from subsets of O, then there may exist an intermediate language sufficient for the description and analysis of the conceptual contents derivable from C and from the related partitions { Di }.  The object of investigation may be a natural phenomenon and the constructs

 

( O, Q, C, { Di } )

 

consequence to a process of complex science.  The intermediate language would be derived from these constructs.

 

If both O and Q are open, the intermediate language describes a "semiotic" system if compartmental transformation rules can be specified and if a substructure for computational inference rules exists. 

 

Let K be the vector space where each semantic primitive is assigned a distinct dimension.

 

As in the case with the vector space S, this assignment will impose an unique location to the set O.  The set O then has the unique location property.  The unique location property is realized in the Hilbert space as locations of points.  Scatter-gather on a circle can be performed for each dimension.

 

The data’s organization can at any time be projected into computer memory as localized bits of information, relational database records or CCM constructs, in such a fashion as to preserve the unique location property.

Now we can specify an associative memory between the two vector spaces, one being a theme space and the other being a semantic space. We are allowed to define a simple bi-directional associative neural network (using back propagation or other method) where the training set is composed of vector pairs from K and S. 

This associative neural network allows theme-based retrieval from the Concept Space and knowledge association from the Theme Space.