Monday, September 06, 2004

On mapping the social expression in real time

To the graphical representation of human knowledge bead thread à

Social Constructions and Symbols

Symbols expressed in social discourse provide a means for communication. These symbols exist in many forms, including as measured in patterns of verbal and written expression.

In social systems, the individual members of social groups recognize and produce symbols as part of normal discourse. Symbols are important instruments for complex social processes.

The recent paper in Psychological Bulletin (2002, Vol 129 No 3 339-275), “Political Conservatism as Motivated Social Cognition” developed some of the issues related to the formation of fundamentalist belief as motivated by an avoidance of uncertainty in the presence of fear.

We conjecture that many memetically simple social constructions develop when a population is under stress. Stress induced social symbols co-occur with periods of social disorder and may be instrumental in the onset of a feedback loop that aggravates the social disorder. These stress induced social symbols are often expressed as an extreme form of fundamentalism, that is to say, as a memetically simple belief that cannot be effectively challenged.

From a October 2003 paper,

“Mapping the expression of social symbols”

This bead is written in response to some indications from Bill Benson and John Sowa, regarding what the group might reasonably claim. It will be edited over the next few days as others make comments or suggestions. How might the Readware and Orb integration work, and why might one expect to be able to create profiles from the letter semantics that differentiates between types of social groups and concepts being expressed in these social groups. To give a high level over view of this, we use a descriptive enumeration of the issues involved:

1.0: The integration itself

2.0: The Readware benchmarks

3.0: Variations that the user is allowed to make

1.0: The integration itself

1.1: The functional properties of Readware will not be changed.

1.2: The current encoding of the Readware letter semantic information is now made into a modified hash table, where some empty containers are present as a result of standard Hash table construction

1.2.1: the current list contains a little over 2000 elements

1.2.2: empty containers are important to key-based hash tables only if there is a need to update the list.

1.2.2.1: The update process is quite different if one has the key-less hash table. There are never any empty containers.

1.2.2.2: Each element in the list, now regarded as a set, is ordered.

1.2.2.2.1: A topological construction is used that stops short of developing numbers, and instead develops a linear order on the set of lower case letters.

1.2.2.2.2: This order is not the same as is implemented in the Primentia data encoding. The classical positional notation, developed some three centuries ago, is generalized to impose a virtual order on any composition of these letters and to thus uniquely imply an order on the set of strings.

1.2.2.2.3: We hope and expect that a formal system might be developed to talk about the creation of categories, even down to a set of atoms that are substructural to the letter semantics. At that point, non-linear order relationships and even abstract general graph theoretical constructions are possible.

1.2.2.3: The key-less hash table architecture is given in [83]. Each element in the hash table is what we refer to as a “loaded point”.

1.2.2.3.1: We say that each element is a “point”, because there is a geometrical relationship over all of the elements of any set of letter strings.

1.2.2.3.2: The set has an isomorphic relationship to an ordered set of discrete integers, expressed as if on a geometric line.

1.2.2.3.2: We say that the point is “loaded” because there is a standard size container for data related to the point.

1.2.2.3.3: The container is used to place a pointer to location information, provide a linked list to all letter stings that are compressed into the same category, a category “represented” by the “prime” substructural invariant, one of which is in the list of points.

1.2.2.4: I have had experience with these types of encoding in both the design of the SLIP (Shallow Link analysis, Iterated scatter-gather, and Parcelation) software that Ontologystream coded in 2001, and in the Orb encoding that Nathan coded this year.

2.0: The Readware benchmarks. [18]

2.1: Readware has been using a specific set of letter strings, mostly triples, but sometimes having four letters, since their 1987 patent. The current set has been used since 1997.

2.2: In one case, Ken sent to me a list of “compressed” categories. The Orb browser, Nathan’s software, was used to produce “category” level co-occurrence patterns that we then visualized with the SLIP browsers (see figure 1 [ 87] ).

2.2.1: The compression of text into a category token, an element of the list of letter strings in the Readware letter semantics table, is done in a linear fashion, now. The compression will eventually use a generalFramework (gF) theory to produce measurements using a specific type of non-linear compression [1].

2.2.1.1: Each significant word, determined by a “controlled vocabulary” called the Go-list, is matched against the ordered elements of a long RIB (key-less hash table) where over 20,000 words have been placed.

2.2.1.2: Each of these words has one or more letter semantic category in the “container”.

2.2.1.2.1: The fact that we have a discrete line allows a very fast solution to the set membership problem (as demonstrated first by the SLIP scatter-gather algorithm implementation.

2.2.1.2.2: Disambiguation/ambiguation methods can be used to determine which category to collapse into.

2.2.1.2.3: It is always remembered that each of these computational steps has a degree of uncertainty, and that this degree of uncertainty can be measured but not perfectly. In fact at times the sense-making that is “assumed” to be occurring by the algorithms will produce poor results.

2.2.1.2.4: Humans have to be involved in HIP (Human-centric Information Production) so as to reduce the dependency on these uncertain computations.

2.2.1.3: Thus the collapse of all significant words into substructural elements is accomplished in near real time, even if the text stream happens to be very large.

3.0: Variations that the user is allowed to make

Quote from bead [80]

We also wrote a search algorithm to parse triples from natural terms. We had pretty sophisticated morphological rules that gave us a good guess at a 'singular unity' (an occurrence on which we could obtain a reliable measure of fidelity).

plant = pln

transplant = tpl

replica = rpl

replacement = rpl

We can treat each of these occurrences of terms as (possibly) related (similar) even though they differ in size and letters to a great extent.

An understanding about the “science” of “concept measurement” is needed.

The measurement of concepts can be usefully compared with the measurement of quantum mechanical “states”. From quantum mechanics, we know that the measurement itself has a tendency to perturb the thing measured.

The experience of concepts is subjective, and we are not fully aware of how the experience comes about or the consequences of a mental experience. We may conjecture that the root of the concept is not something that exists in the normal notion of “existing”. This conjecture follows the work by Hameroff, Penrose and others who look to a full physical theory of mental function.

By “root” we mean “how the concept comes about”, not simply as one person’s experience by one person, but as part of the real phenomenon that becomes involved in the full acts of human communication.

We could say that the root is a “pragmatic root” and only exists in the present moment. The abstractions that we develop to talk about roots need to have the deeper qualities that grounded notions about natural ontology have, but which the notion of machine ontology is often missing.

The long process, through which Ewell and Adi developed their substructural semantics, separated the roots of concepts from the “measured elements”. This is done reasonably well, but not perfectly. For example, quasi-axiomatic theory provides a qualitative structure function analysis (Q-SAR) that allows the prediction of function of compound given partial knowledge of composition of substructure.

We then formally use both the substructural semantics, and the associations that are observed between these elements and the occurrence of subject indicators that are used in social discourse.

The Orb encoding assists in the presentation of derived subject matter indicators for human’s to observe and experience. The root of the concept exists in situational context. In the Provenance ™ software, the measurement of the occurrence of concepts, in real time as expressed by humans using natural language, uses instrumentation, encoding of some data into some data structure, and the interpretation of this structure by human observation.

3.1: Stratification theory provides a notational basis to architecture that we expect to use to encode various substructural ontologies into Orbs. Each of these substructural ontologies will be similar, in construction, to that which was produced by Ewell and Adi and encoding into the Readware products.

3.1.1: Science is a balance between empirical observations and theoretical constructions. Ewell and Adi developed a specific set of mappings from a 4x8 framework to a set of 2300 “lexical tokens”, having the form of letter triples or in some rare cases 4 letters.

3.1.2: We feel that a general methodology might be developed that uses the play between theory and experiment to produce situational substructural ontology. (see next bead [103] à )

3.1.3: We should be clear that we mean that substructural ontology might be developed for any class of systems that exists as transient structure far form thermodynamical equilibrium.

3.2: The Conjecture on Stratification implies a strong form of stability to the substructural ontology. Thus the development of different substructural ontology presents an interesting challenge to the Conjecture.

3.2.1: The Readware Provenance ™ knowledge artifacts provide a well-developed example of a model of substructural ontology for all of natural language.

3.2.2: The existence of other examples is known, but in most cases there is a limited peer review by natural scientists. In Readware’s case, that is a process that makes the Readware Provenance ™ substructural ontologies reliable as objective theories of organizational cause.

3.3: The tutorial that we need to develop should take a small text, like one of the Aesop fables, and show what the string of significant words are, given a Go-list, then show the linear list of substructural categories, and then the screen shots for the SLIP visualization of the related Orb “measurement” of the co-occurrence patterns (see figure 1 [ 87] )

3.3.1: The iteration that is possible involves the modification of the Go-list, modification of any thesaurus/ontology services that might act on the list of significant words in the fable, and modification due to stemming or not.

3.3.1.1: This iteration can involve other people in a “knowledge management” methodology that uses the Actionable Intelligence Process Model.

3.3.1.2: We prefer to think about using a SchemaLogic Inc, SchemaServer software system for this part of the system.)

3.3.2: The collapse of the set of significant words, after modifications, into the elements of the set of semantic primitives is fixed by the assumption that the identification of the elements from the set of semantic primitives are fixed due to the long term re-enforcements of the categories through human use of language. (This is the Conjecture on Stratification).

Back to the graphical representation of human knowledge bead thread

[1] I have worked on this concept for over a decade. While Senior Scientist at Highland Technologies (1995 1997) I was able to develop a formal notation about the capture of “passages” where these passages overlapped in the text.