eventChemistry     .

 

Determining Functional Load using SLIP

 

by, Paul Prueitt, PhD

 

Founder, (1997) BCNGroup.org

President, OntologyStream Inc.

 

Draft:  December 27, 2001

 

The concept of functional load is addressed at some length in John Lyons’ Book “Introduction to Theoretical Linguistics” (Cambridge University Press, 1968).  In Lyons’ book, the notion of functional load is treated as a cause of the distribution of basic compositional elements related to spoken and written expression.  This notion is a part of the tradition in theoretical linguistics and follows the work of de Saussure (Course in General Linguistics, Payot, 1916) and Z. Harris (Methods of Structural Linguistics, Univ. Chicago Press, 1951).

 

In essence, the notion is that sounds that are easy to make will be used in situations where ambiguity of expression has some penalty.  So a basic investigation, on auditory and acoustic phonetics, leads to an understanding of how language is used and evolves.  Auditory and acoustical coherence and discordance is reflected in the structure and form of natural language.  The investigation leads to partial knowledge when the target of investigation is a complex system.

 

My background is not so strong on theoretical linguistics that I cannot proceed right away into a discussion of phonetics and grammar and make comparisons between the internal structures of various language groups.  I will leave this to others.  In any case, this is not the primary purpose of the OntologyStream Inc (OSI) Browsers.  However, we suggest that these browsers can be used as teaching tools and investigation tools related to distributional analysis in general terms.  In this way, distributional analysis leads to the notion of event chemistry even if the target of investigation is not linguistically focused.  The event chemistry has to have:

 

1)      A theory of substructure (illustrated by the making and hearing of sounds in Lyons’ perspective on functional load)

2)      Laws or rules of assembly, even if these laws are distributed and not precise

3)      Expectancy 

 

Natural language can be considered as a complex system that is in fact stratified into organizational levels related to how language is remembered, how the human body generates cognition and language expression, and how natural language is reinforced within a social system.  But natural language is only one example of a complex system. 

 

OSI Browsers are being developed to study the following complex systems:

 

1)      Telecommunications systems

2)      Systems of financial transactions

3)      Activities of the Patent and Trademark Office

4)      Virtual discussion in electronic forums

5)      Hacker activity in the Internet

 

Clearly Natural Language Processing will provide some tools, particularly with #3 and #4, where it is essential that concepts and themes expressed in natural language is a significant part of the study.

 

But NLP work presents us with a dilemma.  This dilemma is well stated by Lyons at the end of his second Chapter:

 

“Two apparently contradictory principles has been maintained in this section: first, that statistical considerations are essential to an understanding of the operation and development of languages; second, that it is in practice (and perhaps also in principle) impossible to calculate precisely the information carried by linguistics units in actual utterances.  This apparent contradiction is resolved by recognizing that linguistic theory, at the present time at least, is not, and cannot, be concerned with the production and understanding of utterances in their actual situations of use (except for a relatively small class of language-utterances which can be handled directly in this way), but with the structure of sentence considered in abstraction from the situations in which actual utterances occur.” – page 98.

 

The link analysis made using the Warehouse Browser provides a weak measure of functional load when the induced metric is distributed and used to control an emergent computing process in the scatter-gather in the SLIP Technology Browser.  A stronger measure of functional load is possible only as domain experts begin to develop specific understanding of the event atoms and the patterns of co-occurrence that is seen in practice.  This encoding can be facilitated through knowledge management principles.

 

What one expects is the development of a type of periodic table of elementary event atoms.  The development of this table is based on experimental results involving the derived relationships between occurrences of atoms in event logs.  In computer intrusion work, these atoms might be IP addresses or port values.  In text understanding, the atoms may be co-occurrences of words in paragraphs or other text units. 

 

 

Figure 1: Atoms from one of the SLIP categories

 

In Figure 1 we show the event atoms for a category derived from a quick study of the functional loads of the Aesop collection. As in Latent Semantic Indexing (LSI), we focus on a relationship between the membership of an “internal token” and a profile of the larger group.  In this case the token is a member of a set of referent tokens (a very simple, and hand made, dictionary) and the larger group is the collection of individual fables. 

 

Table 1: the data set for Figure 1

 


token                      name

begged                   260

protected               260

placed                    222

fox                           222

fox                           222

house                     222

sailing                    232

keep                        232

inquired                 232

see                          232

enemies                  232

enemies                  232

storm                      232

danger                    232

ends                       232

enemy                    232

seeing                    133

get                          133

another                  133

passing                  133

heard                      133

inquired                 133

happened              133

get                          133

fox                           133

fox                           133

meat                        133

shepherds             133

fate                         133

fox                           133

cries                        133

friend                      133


tradesmen              112

called                      112

protecting              112

proposed               112

method                   112

stood                      112

enemy                    112

preferable              112

defense                  112

striving                  176

led                           176

manage                  176

save                        176

calf                          176

calf                          176

offered                   176

argued                    200

hares                      200

lions                       200

hares                      200

assembly               200

lions                       200

words                     200

hares                      200

teeth                       200

lay                           236

appeared                236

dogs                       236

house                     236

dog                         236

house                     236

summer                  236

house                     236


addressed              170

freedom                  170

put                          170

eat                           170

give                        170

favorably               170

wolves                   170

wolves                   170

mind                       170

brothers                 170

slave                       170

bones                     170

dogs                       170

proposals              170

wolves                   170

fell                           121

allowing                 121

share                      21

fellows                   121

fell                           121

milk-woman           121

farmers                   121

milk                         121

field                        121

money                    121

milk                         121

end                         121

money                    121

fellows                   121

moment                  121

milk                         121

ground                   121

schemes                 121


Looking at Figure 1 we see that atom 260 (one of the fables) has only two valances, and atoms 222 has three.  By “valance” we mean here that the Analytic Conjecture has established an inference regarding how fables are related to each other via the Dictionary of tokens.  This fact is reflected in Table 1.

 

Looking at concept linkage and functional load

 

In the simple exercise, to follow, we look at the concept linkage between the elements of text in a collection.  The concepts are weakly represented by a collection of nouns and verbs that have been extracted from the fable collection.  Functional load is to be identified through what ever means we can.  The first step towards obtaining a validated theory on the functional load in the fable collection is to build a first approximation using the co-occurrence between individual fables and a unified list of nouns and verbs (called the dictionary).  A datawh.txt file was produced for this purpose in the previous exercise.

 

A deeper study of functional load related to the fable collection can be made.   One could, for example, parse the fables and identifying when a noun and a verb from the Dictionary were both contained in the fable.  The events reported out to a new datawh.txt would then have the form

 

( noun, verb, fable name )

 

Then the analytic conjecture could be done using nouns as the “a” value and verbs as the “b” value.  But in this Exercise, we keep things very simple for illustration purposes.

 

Table one is the Report generated by the Technology Browser for category ‘R-D-level’ (residue at the D level).

 

      

a                                                      b

Figure 2: The Analytic Conjecture for tokens, and a SLIP Framework

 

In working with this exercise, we have two resources.  First the collection of fables are posted one URL at a time using the sting:

 

(www.) + ontologystream.com/IRRTest/fables/BEAD(N).HTM

 

where “(N)” is replaced (manually) by the atom number.  We have checked a few of these, but not all. 

 

The idea, with the fable collection, has been to prototype a BeadGame Communities software system based on work considered and made over a period of a decade by the BCNGroup.org Foundation.  However, this long-term goal requires that our group make economic gains first.  These gains will come from the application of the NLP Browser to the analysis of patent information and stockholder reports. 

 

So we use the fable collection as a means to illustrate what we have by way of technology and where there are still software development issues that need to be solved.

 

The reader is now asked to download this exercise’s zip file. The zip file for this exercise is TAI.zip. (289 K zipped, including the three browsers and a data set.).  The code is Visual Basic developed by OntologyStream.com. 

 

After opening the Warehouse Browser (SLIPWhse.1.2.0.exe), one will see Figure 2a.  Opening the Technology Browser (SLIP.2.2.3.exe) will show the user an computer interface that looks like Figure 2b. 

 

The tree like structure, called the SLIP framework, seen in Figure 2b is developed by taking the elements of the category A1 and randomly scattering these elements (called SLIP atoms) to the circle.  One can review the numerous exercises to review how these SLIP atoms are developed.  The SLIP atoms are the “a” values from the analytic conjecture that where found (by the Technology Browser) to have been paired with a “b” value.  In this case, the SLIP atoms are names of fables (labeled by the fable number). 

 

So let us see how the SLIP atoms are created. 

 

If you have not already done this, download TAI.zip and unzip into an empty folder.  You may remove and delete the contents of the Data folder except the single file datawh.txt.

 

 

Figure 3: TAI.zip unzipped

 

Then click on SLIPWhse.1.2.0.exe .  Enter the commands, “a = 1” and then “b = 0”.  Enter the commands “pull”, followed by the command “export”.

 

You can then look into the Data folder and see that several new files exist, one of which is Paired.txt (used by the Technology Browser) and Links (used by the Event Browser.)

 

Now click on SLIP.2.3.3.exe.  The Technology Browser starts with an empty A1 node in the topic graph window.  Enter the commands “import” and then “extract”.  One can type in “help” to find out more about these functions.  Then click on the A1 node.

 

The development of the SLIP Framework can be automated. But currently we insist on the user developing the Framework by visually finding areas of interest by looking at the emergent clustering that occurs on the circle.  Enter the command “random’ and then the command “cluster”.

 

The cluster command will produce 100,000 iterations of a seek function.  For this data, with this analytic conjecture, this is too much iteration.  Enter the command “random” and then “cluster 2”.  Entering a return should add 2,000 iterations each time.  By the time you get to 12,000 iterations your gather process should look something like Figure 4b.

 

  

 

a                          b

Figure 4:  Clustering the top node

 

We are looking for a small group of atoms.  A small group helps we point out specific features of the event chemistry related to that small group.  We may take the middle out of the large distribution and look into the complement.  To do this you may type in a bracket command “x, y -> B1”, where x and y bracket the interior of the large distribution.  Then click on the A1 node and type “residue” to put the complement of B1 into a second category. 

 

Now click on the residue category, labeled “R”, and type random.  Without clustering, take 90 degrees (any 90 degrees) and put the atoms into the category C1.  To this by typing, for example, “45, 135 -> C1”.   Click on the new category.  If you type “cluster” you will most likely see the quick formation of several groups that do not move together. 

 

Choose the largest of these (hopefully you will have at least two and less than five elements in this group.)   If not, then go to the data folder and delete subfolders of A! and type in load into the command line.  This will allow you to start over. 

 

Once you have a category with two or three or four or five elements in the category, then type in “key = 1”, click on the Report button and type generate in the command line. 

 

 

Figure 5: the Report for a category of two atoms.

 

You will have different results than I.  However, you have captured two – five atoms that

 

1)      Are all connected by the co-occurrence of one or more token and

2)      Have an average number of valances that connect to atoms outside the category.

 

The two fables that I have identified here are (191) The Jackdaw and the Doves,  and (113) The Master and His Dogs .  The linkage is via the token “food”.

 

Table 2: The valances of 191 and 113

 


token                      name

 

seeing                    191

painted                   191

joined                     191

share                      191

discovered            191

recognizing           191

desiring                  191

jackdaw                  191

jackdaw                  191

food                        191

day                         191

character                191

food                        191

jackdaws                191

ends                       191

 

killed                       113

obliged                   113

seeing                    113

took                        113

own                        113

master                    113

dogs                       113

storm                      113

country                  113

house                     113

goats                      113

household             113

storm                      113

yoke                       113

oxen                        113

food                        113

dogs                       113

counsel                  113

time                         113

master                    113

oxen                        113

friend                      113


 

The Event Browser was used to see these two atoms (Figure 6).

The Event Browser is not yet completed, so we have to select the folder and the Members.txt to see the atoms.

 

 

a                                                               b

Figure 6: The selection of the Members file in the node and the event atoms

 

In Figure 6b we see 113 and 191.  An examination of the valance file will show that these two atoms have no relationship to any of the other atoms in this sample.

 

Please send comments to Dr. Paul Prueitt.