Diagrams illustrating the disambiguation issue

 

Ontology Stream Inc  9/21/03

 

Software at:

 

www.ontologystream.com/cA/tutorials/disambiguation.zip

 

In the detection of memetic expression one often develops an assumption of relationship based on various means.  The results of algorithms is often delivered in terms of a set of ordered triples

 

{  < a, r, b >  }

 

where a and b are words or phrases and r is an non-specific relationship. 

 

For example:

 

  

 

Figure 1: Two sets of relationship that have the same subject indicator, “bush”

 

 


subject(i)          subject(j)

 

bush(1) relationship1

leader   relationship1

war       relationship1

taxes    relationship1

garden  relationship1

 


subject(i)          subject(j)

 

bush(2) relationship2

plant     relationship2

ground  relationship2

leaves   relationship2

green    relationship2

garden  relationship2

 


 

The data in the above table has the form necessary for the SLIP browsers to produce the event chemistry from an aggregation of categorical invariance into categories. 

 


The set of derived relationships are

 

{ <bush(2), r, garden>, <bush(2), r, green>, <bush(2), r, ground>,<bush(2), r, leaves>,

<bush(2), r, plant> }

 

and

 

{ <bush(1), r, leader>, <bush(1), r, war>, <bush(1), r, taxes>,<bush(1), r, garden> }

 

In Figure 2 we notice that the string “garden” is co-occurring in both the context of bush the president and bush the plant.

 

 

Figure 2: Intersection between the atoms of two categories

 

Figure 2 one can make the plausible inference that one occurrence of the term “garden” has subject indicator “the President’s rose garden”. 

 

The information in Figure 1 is given in a slightly different form in Figure 3.

 

                  

 

The method of disambiguation takes the ending nodes of relationship 1 and relationship 2 as the basis for using the Prueitt Voting Procedure: 

 

http://www.bcngroup.org/area3/pprueitt/kmbook/Appendix.htm

 

The application of the Prueitt Voting Procedure allows a two-step process.

 

The first of which is a human sorting into bins of graph branches from measures of local linguistic variation using a word level n-gram measurement process.  A human-on-the-loop is judged absolutely necessary for high quality memetic detection.

 

http://www.bcngroup.org/area2/KSF/HIP.htm

 

The second is an automous routing of branches being produced from a word level n-gram measurement process into categories that correspond to subject indicators that the bins have been developed for.