(bridge) Send note to
Paul Prueitt .
(back
to SBF3)

We ask that
comments and discussion on this series be made in the __Yahoo
groups from, eventChemistry.__

APC1 – On
enumeration by the human of the cell values over an event class

APC2– Minimal
Voting Procedure (original paper)

APC3 – Using
the MVP to rout information

APC4 – Using
eventChemistry to improve the framework specification

**Action Perception Cycles 2**

** **

**Minimal
Voting Procedure**

**Directly from the Appendix, Foundations of Knowledge
Science **

**Description of the Minimal Voting Procedure (MVP)**

**First
Published, in Russia: 1997**

**(original notation) **

**Appendix: Description of the
Minimal Voting Procedure (MVP)**

To instantiate a voting procedure,
we need the following triple < **C**, **O****1**, **O****2**
> :

· A set of categories **C** = { C_{q}
} as defined by a training set **O****1**.

· A means to produce a document
representational set for members of **O****1** .

· A means to produce a document
representational set for members of a test set, **O****2** .

We assume that we have a training
collection **O****1** with m document passages,

**O****1**** = **{ d

Documents that are not single
passages can be substituted here. The notion introduced above can be
generalized to replace documents with a more abstract notion of an
"object".

Objects

**O **= { O_{1}
, O_{2} , . . . , O_{m} }

can be documents, semantic passages
that are discontinuously expressed in the text of documents, or other classes
of objects, such as electromagnetic events, or the coefficients of spectral
transforms.

Some representational procedure is
used to compute an "observation" D_{r} about the semantics of
the passages. The subscript r is used to remind us that various types of
observations are possible and that each of these may result in a different
representational set. For linguistic analysis, each observation produces a set
of theme phrases. We use the following notion to indicate this:

D_{r
}: d_{i} à { t_{1} , t_{2}
, . . . , t_{n} }

This notion is read "the
observation D_{r} of the passage d_{i} produces the
representational set { t_{1} , t_{2} , . . . , t_{n}
}"

We now combine these passage level
representations to form a category representation.

·
each "observation", D_{r},
of the passages in the training set **O****1** has a
"set" of theme phrases

D_{r} : d_{i} à **T**_{k }= { t_{1} , t_{2}
, . . . , t_{n} }

·
Let **A** be the union of the
individual passage representational sets **T**_{k}.

**A **= È **T**_{k}.

This set **A** is the
representation set for the complete training collection **O****1**.

·
The overlap between category
representation **T _{q}**, and

J. S. Mill’s logics relies on the
discovery of meaningful subsets of representational elements. The first
principles of J S Mill’s argumentation are:

1. that negative evidence should be acquired as
well as positive evidence

2. that a bi-level argumentation should involve
a decomposition of passages and categories into a set of representational
phrases

3. that the comparison of passage and category
representation should generalize (provide the grounding for computational
induction) from the training set to the test set .

It is assumed that each "observation",
D_{k}, of the test set **O2** is composed from a "set" of
basic elements, in this case the theme phrases in **A**. Subsets of the set
are composed, or aggregated, into wholes that are meaningful in a context that
depends only statistically on the characteristics of basic elements.

The general framework provides for
situational reasoning and computational argumentation about natural systems.

For the time being, it is assumed
that the set of basic elements is the full phrase representational set

**A **= È **T**_{k}.

for the training collection **O1**.

We introduce the notation for the
derived representational sets **T* _{q} **for each C

Given the data:

**T* _{q} **for
each C

the representational sets **T _{k}
, **from the observations D

This hypothesis will be voted on by
using each phrase in the representational set for D_{k} by making the
following inquiries for each element t_{i} of the representational set **T _{k}**:

1. does an observation of a passage, D_{k},
have the property p, where p is the property that this specific
representational element, t_{i} , is also a member of the
representational set **T* _{q} **for category q.

2. does an observation of a passage, D_{k},
have the property p, where p is the property that this specific
representational element, t_{i} , is __not__ a member of the
representational set **T* _{q} **for category q.

Truth of the first inquiry produces
a positive vote, from the single passage level representational element, that the
passage is in the category. Truth of the second inquiry produces a negative
vote, from the single representational element, that the passage is __not__
in the category. These votes are tallied.

__Data structure for recording the
votes__

For each passage, d_{k} , we
define the matrix A_{k} as a rectangular matrix of size m x h where m
is the size of a specific passage representational set **T _{k}**,
and h is the number of categories. The passages are indexed by k, each passage
has it’s own matrix.

Each element t_{i} of **T _{k}**,
will get to vote for or against the hypothesis that this kth passage should be
in the category having the category representational set

a_{i,j }=
-1 if the phrase is not in **T* _{q}**

or

a_{i,j }=
1 if the phrase is in **T* _{q}**

Matrix A_{k} is used to
store the individual + - votes placed by each agent (i.e., the representational
element of the phrase representation of the passage.)

This linear model produces ties for first
place, and places a semi-order (having ties for places) on the categories by
counting discrete votes for and against the hypothesis that the document is in
that category.

__A second data structure to record
weighted votes__

A non-linear (weighted) model uses
internal and external weighting to reduce the probability of ties to near zero
and to account for structural relationships between themes.

Matrix B_{k} is defined:

b_{i,j }=
a_{i,j} * weight of the phrase in **T _{k }**

if the
phrase is not in **T* _{q}**

or

b_{i,j }=
a_{i,j} * weight of the phrase in **T* _{q}**

if the
phrase is in **T* _{q}**

This difference between the two
multipliers is necessary and sufficient to break ties resulting from the linear
model (matrix A_{k}).

__Data structure to record the
results__

For each passage representation and each
category, the tally is made from the matrix B_{k} and stored in a
matrix C having the same number of records as the size of the document
collection, and having h columns – one column for each category.

The information in matrix C is
transformed into a matrix D having the same dimension as C. The elements of
each row in C are reordered by the tally values. To illustrate, suppose we have
only 4 categories and passage 1 tallies {-1214,-835,451,1242} for categories 1,
2, 3 and 4 respectively. So

cat1
à -1214, cat2 à
-835, cat3 à 451 and cat4 à
1242.

By holding these assignments
constant and ordering the elements by size of tally we have the permutation of
the ordering ( 1, 2, 3, 4) to the ordering (4, 2, 3, 1).

(
1, 2, 3, 4) à ( 4, 2, 3, 1).

This results show that for passage
1, the first place placement is category 4, the second place is category 2,
etc. The matrix D would then have (4, 2, 3, 1), as its first row.

__ __

(bridge) **Comments
can be sent to ontologyStream e-forum** . (back to SBF3)