Architecture.                                                     .University.                                                         .Conference

 

Notational paper

SLIP foundations

 

Two reification processes

to be included in

CCM-Powered Conceptual Rollup

 

July 17, 2003

 

Paul S. Prueitt, Ontology Stream Inc

 

First Note:  Two concepts are related if a certain measure of relatedness exists between words that are contained in the common expression of the concepts.  In linguistic analysis, and the analysis of patterns of co-occurrence, natural language expression is parsed by computational means to create identification about the nature of the concepts being expressed.

 

Is it correct that identifiers for related concepts can be computed using only the co-occurrence information developed about related words?  The answer requires some contemplation and experience. 

 

If co-occurrence patterns identify the expression of concepts, then the statements made below anticipate the identification of concepts from the co-occurrence of related words, within context.  This process has been called semantic extraction, although the meaning of words and word patterns are only a partial indication of the full scope of a larger social discourse. 

 

 

Topological Logic:  Clearly a concept should be most related to itself.  A measure of relatedness, based on the expression of concepts, as words should be judged based on self-relatedness and well as the ability to make distinctions.  The measure leads directly to a set of formalisms developed by a small school of logicians.  This formalism is called topological logic.  To set up topological logic we develop a natural representation of the process involved in producing local measures of relatedness.

 

The concept of relatedness becomes a function of an entire collection of text sources if there is some type of divisor that is proportional to

 

1) the number of concepts,

2) the number of words in all concepts

 

This count can be over all stem/word/phrase occurrences, or over all "subjects" (say as encoded into the description logic based ontology, or over all categoricalAbstractions (cA).  The focus of a set of concepts can be a non-linear, topological logic, measure of the importance of individual concepts, thus producing figure-ground attentional mechanism simply by changing the divisor.  A generalization of this effect has been well studied in the artificial neural network research literature (see foundations papers by Prueitt and Levine and Prueitt [1] )

 

Our notion of subject and occurrence is defined as in the Topic Map standard.

 

This notion of attentional focus is derived from our experience with neural networks and perceptual psychology.  [2]

 

 


Structural nature of a concept

 

This leads us with the question about the structural nature of a formal or linguistic or semiotic representation of a concept.  We have some candidates based on the current NdCore conceptual rollup (see notational paper for generalization of NdCore).

 

NdCore conceptual rollup measurement process uses n-grams:  This measurement produces a set of tree branches, two for each significant word.  As of 2002, the NdCore technology had some "contextual" information but this contextual information has not undergone a human reification process or made subject to ontology services.  Of course this can be done and the algorithms that then act on the Input Array (which is the name of the ordered set of tree branches) will perform conceptual services.

 

Let us indicate this Input Array, with the symbol “I”, and any reified Input Array “I(r)”.

 

Figure 1: A very simple Input Array with two elements (branches)

 

If we have only the significant words w(1), w(2), w(3), w(4) in a single sentence then we have the singe tree in figure 2.   The 5-grams are { [ *, w(1), w(2) , w(3), w(4) ], [ w(1), w(2) , w(3), w(4), * ], … }. Remember that the textID is used to hang the “center” of the 5-gram.

 

Figure 2: A simple input tree with four branches

 

 


The Orb construction is used in creating the inversion of I.  In NdCore, this inversion is a set of simple trees, with the root of each tree having a correspondence with each significant word subject.  (see notational paper)

 

The inversions bring together all occurrences of a word into a single tree-construction with the root node the "subject" of these multiple occurrences.  This process only needs EITHER I or the reified Input Array “I(r)” to "work".  Given the Input Array, an Array (or hash table) is produced that encodes these "subject trees" into a memory structure.  We may call this the Output Array, or O.

 

If in the process of producing these "subject trees" there is a separation of occurrences of words so as to reflect word ambiguity in various contexts, then we will call this the reified Output Array, or O(r).  This separation of contexts within localized constructions (a controlled vocabulary) is consistent with the innovation developed by SchemaLogic Inc based on a localization of controlled vocabulary into containers.  [3]

 

Once the SchemaLogic innovation is followed, it is possible to develop ontology based on these containers and easily produce knowledge management type reconciliation processes that reify.

 

The type:value pairing in the NdCore system reflects a number of issues related to the nature of category and the formation of natural category.  (See Preface and Chapter One of “Knowledge Foundations” )

 



[1] Levine, D. & Prueitt, P.S. (1989.) Modeling Some Effects of Frontal Lobe Damage - Novelty and Preservation, Neural Networks, 2, 103-116.

Levine D; Parks, R.; & Prueitt, P. S. (1993.) Methodological and Theoretical Issues in Neural Network Models of Frontal Cognitive Functions. International Journal of Neuroscience 72 209-233.

Prueitt, P.S. (1994). System Needs, Chaos and Choice in Machine Intelligence. Chaos Theory in Psychology (A. Gilgen and F. Abrams, Eds.) Contributions in Psychology Series. Westport, Conn.

Prueitt, P.S. (1995) A Theory of Process Compartments in Biological and Ecological Systems. In the Proceedings of IEEE Workshop on Architectures for Semiotic Modeling and Situation Analysis in Large Complex Systems; August 27-29, Monterey, Ca, USA; Organizers: J. Albus, A. Meystel, D. Pospelov, T. Reader

 

[2] The notion finds an architectural embodiment in the BCNGroup Roadmap developed for US Customs in 2004. 

[3] The combination, and generalization of the NdCore and SchemaLogic innovations is a core part of the BCNGroup RoadMap.