Structured
Information Localization and Globalizations
Processes
and Procedures using CCM
A briefing to the Virginia
Bioinformatics Institute
August 28, 2003
…
Overview: We regard the CCM (Contiguous Connection Model) construction as having two aspects (1) localization of information and (2) organizational processes. Informational localization can be rendered into (type:value) pairs, or into constructions that depend on (type:value) pairs. Computer-based information organizational processes vary in nature, but should account for observed structural relationships between manifestations of type. Some of these processes have never used CCM constructions and thus may be reduced to practice as a CCM-related patent (see Section 2). New processes and perhaps new ways of encoding (type:value) pairs or performing the organizational processes may also lead to new IP.
The development of new patents based on this construction will occur as knowledge scientists engage creative reflections while attempting what are now hard or impossible problems. One class of these hard or impossible problems has to do with automated and rapid development of topic map ontology from textual literatures. The current limitations on automated ontology expression have to do largely with the statistical nature of most, or even all, textual understanding systems. This limitation is also found in the current NdCore conceptual rollup system. However, we are posed to overcome this limitation by the inclusion of formative ontology processes that use preexisting resources to focus the conceptual roll-up without statistical analysis, or as a supplement to statistical analysis. Beyond the real time / situated ontology construction problem is the problem related to creating predictive inference (deductive) mechanisms that depend on situational ontology having high resolution and high fidelity.
Section 1: The Human Cognitive Model: Localization depends on the invariance in the textual substructure of what is communicated by specific informational structure. In natural systems, localization of structure/function requires a behavioral/functional commonality and thus is involved in producing natural archetypes. When abstracted into language these archetypes are rendered as type and reflected in linguistic variation in text. But type is realized as a combination of substructural elements as in the well studied double articulation of phenome in spoken language, and in case grammars involving a normalized and structural use of parts of speech. A class of abstractions occurs in the formation and use of natural language. This is because natural language reflects the structure of natural types in the world. Following the double articulation principle, structural stratification can be seen as both internal structure of each type and a structure between type exemplars. It is conjectured that this “structural stratification” is the key to machine inference and high conceptual fidelity.
The internal and external structural coupling seen in co-occurrence, correlation and statistical analysis leaves out the fine resolution that is needed to examine small samples of text (See Section 3 and 4). Thus we need both human reification cycles and ontology services to be part of operational extensions to the current NdCore system. Linguistic variation in the expression of natural language has a loose correspondence to the structure of events in the natural world. Value is a specific instance of a type and carries with it the rich detail that allows natural language to be understood, by humans, within social communication. So specific words and word structures have syntax that is reflective of meaning in the context of broader experiences within the memory and anticipational aspects of human cognition. These aspects of human cognition are not completely reduced to algorithmic form, and thus human reification cycles and ontology services are currently essential to producing comparable fidelity between NdCore results and stand-alone human awareness.
In this sense, context is often translated into a pattern that is composed from invariance in language structures expressed as behavior. So some part of the structural coupling between string values is identifiable as statistical relationships between types. Many of the organizational processes, revealed in the computer science literature, are picking up some, but not all, structural coupling between types. We expect to make optimal use of statistical analysis and extend our capabilities using ontology services (See Section 4).
Section 2: Patent disclosures: A patent is a protected disclosure of innovation that reduces a concept to practice. We expect to reduce to patent disclosures many of those innovations that depend on (type:value) pairing. In many or most cases, the classical use of (type:value) pairing has not taken into account the natural science on type and on type formation in natural phenomenon. This will be capitalized on in a series of patent disclosures of innovations that extends the 1994 and 1996 ATS patents on CCM constructions. These extensions will likely need to license the ATS patents. However, the beauty of the Knowledge Sharing Core concept is that the category of all patents over the emerging academic knowledge sciences is to properly assign value to and credentials to patents that depend on each other. Ownership issues are of major concern, but perhaps a proper evaluative body can be formed to manage these concerns. The Charter of the BCNGroup was developed, in the period 1992 – 1997, for this purpose.
Some CCM constructions are global in scope. The extending algorithms are like latent semantic indexing or scatter-gather methods. Some CCM constructions are selective, such as the branch inversion, where type information is used to guide a larger convolution process. In some cases, however, one can use the categorization results of latent semantic indexing and scatter gather methods to develop a formal specification of type, which then enrich the type structures available to the CCM inversion process (See Figure 1, Section 3).
Some language will help us here. Inversions involve two processes (1) the traversal of a branch (or tree or collection of branches), (2) the convolution over all or some subset of more elementary units (like significant words) where the convolution creates a partition and equivalence relationship. Inversions are a specific type of convolution, as defined in classical mathematics. The convolution is over a set. As each element of the set is visited some action takes place, that action being defined by the convolution operator. In classical mathematics, the set can be infinite of finite in size. In the CCM convolutions the set contains (type:value) pairs and the action is defined by rules.
Speed of convolution operators over hash tables will turn out to be more and more important as we develop more complex convolutions and as we allow the user (or researcher) the parameters needed to redo convolutions experimentally as one tries to bring a specific focus into the conceptual roll-up. The convolution may occur differentially over type-categories or over value-categories – in ways that are disclosed in the 1996 CCM patent.
These “constructed” equivalence relationships are expressed as part of a CCM notational system. The CCM notational system is under development by OntologyStream as part of a contract to ATS. Once expressed in the CCM notational system, one can formally discuss properties related to both fidelity and to efficiency in data processes. For example, the convolution can be formally complex if ontology is used with reconciliation containers. Complexity arises in the naturally occurring ambiguation and disambiguation process that are essential to the use of natural language within communities. Logics over (type:value) pair schema containers follows the auxiliary innovations one sees in SchemaLogic Inc.’s SchemaServer, and other similar systems.
Section 3: NdCore performance on small or large text collections The processes and procedures of the current NdCore have internal decisions that are based on statistical analyses. The statistics requires large collections of text. Two problems are therefore, not directly addressed by NdCore 2.0. The first is domain specific fine resolution. The second is complete conceptual representation in the case of small text collections such as the fables. The two problems are related.

Figure 1: one means to provide additional support for ontology use
Each of these two problems are to be successfully addressed using both linguistic and ontology services. NLP++ is a language for developing multi-pass parsers that identify parts of speech, as well as tag and extract linguistic variation patterns that are relevant to decisions made about type when developing an instrument for measuring of linguistic variation.
Differential ontology and the ontology lens can be applied to methods like latent semantic indexing or scatter gather methods to take clustering or categorization information and apply it to typing decisions. We can role our own clustering and categorization algorithms. The information is to be encoded as RDF triples having the form of a syntagmatic unit < a, r, b> where a and b are locations and r is a relational operator. This information is to be accessed during the measurement process.
Section 4: Actionable Intelligence Process Model Topic maps provide a means to visualize RDF triples, and so is close to the OWL ontology. One only has to add some inferencing logics and class hierarchy structure to the types to produce OWL. Steven Newcomb is one of the authors of the Topic Maps 1.0 standard and has long wished to extend a type of deductive inference based on HyTime and Grove standards. This new type of deductive inference can be stood up along with a plausible inference mechanism based on semiotics and graph similarities. John Sowa as long worked on this problem and will guide the group in extending the notion of deductive inference. We will also have expertise from what remains of the Russian applied semiotics community, although this expertise is very hard to fund.
The full theory is presented in the manuscript and in PowerPoints. Currently the expertise is given based on long-term friendships and sense of community. Human involvement in the process is essential, however.

Figure 2: Actionable Intelligence Process Model
Human annotation of type will operate within the architecture of controlled vocabularies to effect a community based control over the interpretation of linguistic variation measured by the n-grams, traversal of branches and differential convolutions (inversions) over types. Logic over schemas localizes terminological reconciliation using our controlled vocabularies. So the technology and architecture can help us.
Most of our work should be done with experimental systems. Production-systems may require significant redesign to enable differential processes that once studied are to be discarded. Industry is observed to have a cultural aversion to long-term thinking. This aversion causes a phenomenon where those things that are immediate to the knowledge scientist have not been examined in any detail. The business culture has produced several generations of IT products that are not framed properly in a valid theory of action perception. One consequent of this phenomenon is a systematic withholding from the scholarly discussion many of the algorithms and methodologies that corporate entities have laid claims of ownership over.
The Actionable Intelligence Process Model (AIPM) has nine aspects. The first two of these are almost always left out of the picture when negotiating IT procurement for intelligence. (There is historical evidence for this statement’s truth.) However, the first two aspects are also left out in NSF/NIST funding efforts most often. The reason is because the “measurement problem” is a hard problem that is not framed in scientific terms in such a fashion as to expose how a biological system “measures” the structures in the world.