Regarding
a description of the use of hash tables
Modified: September 14, 2003
<business
confidential>
OntologyStream
Inc is studying how BerkeleyDB handles hash tables and b-trees. This work is to be extended over the coming
months.
CCM (type:value)
pairs are mapped into a hash table. The
type and the value together form an element.
The element is then hashed and then mapped to an object. This object contains a data construction to
hold a branch or simple tree.
The element is
those ending nodes of the branches in the Input Bag of Branches, I. In
this instance, the node’s label is a (type:value), where value is the specific
word.
The element is then mapped to a class of objects (eventually in OWL interoperable format) external to the hash.
A set of composed objects are constructed by the cross
scale transform as noted in:
http://www.ontologystream.com/CCM/CCMnotation.htm#_Section_4.2:_
The type
“word” of the ending node is not needed in the simple inversion, as discussed
previously.
In the first implementation
of differential inversions we have two categories of types, a class of nouns
(
enumerated as { n(i) } ),
and a class of verbs
(enumerated
as { v(i) }).
The elements of these two categories can be descriptively enumerated:
http://www.bcngroup.org/area3/pprueitt/private/KM_files/frame.htm
In our current disclosure the ‘recognition of the type” "comes from" a text analysis of the local neighborhood of the center word of the 5-gram, only if the center word is a noun or verb - as determined by fableParse, a program developed by Amnon.
A general linguistic theory of (v,n,v) and (n,v,n) topological covers, using a local linguistic neighborhood (lln) in semantic space, is being developed. There exists a research literature on this topic.
The type is an
indicator of context and thus a first order logic can be constructed that
produces (to be disclosed) ATS ambiguation/disambiguation operators on the
categories of subject indicators - enumerated in correspondence to the ordering
the branches, in I, by the ending nodes.
The
placement of the center word into a BerkeleyDB hash produces a retrieval
mechanism that delivers all occurrences of the same word in a single step. The type is then a "memory of" the
local linguistic variation (llv), and can be mapped to a referential system
that has encoded more complex graph structure corresponding to llns.
How
the stochastic referential system is found is a separate issue, but for now we
are able to produce stochastic models of llv using the current Otis algorithm.
A
possible future system could involve differential ontology and open polylogics.
Any
stochastic referential system is domain specific, and the construction of this
system depends on parameters that can be exposed to the end user so that the
end user can develop different viewpoints over the same domain. Because the stochastic
referential system can be modified in real time, we refer to this as a
formative schema.
Formative
schemas need to be interoperable with OWL ontology.
The
measure of llv is captured as a set of llns.
The notion of a cover over specific linguistic variation with a specific
domain is a issue that can be addressed formally using what is called topological
logic (Victor Finn, Robert Burch).
A
specific set of llns are then used to define the compounded element of logics
over formative schema, the atoms of which are CCM constructions in the format
{ (type:value) }.
The CCM construction is applied only to the center of the n-gram (or generalized n-gram construction), and then multiple branches are fractured away from this n-gram to produce the bag of branches.