National Knowledge Project

Thursday, August 19, 2004

Readware’s Method for Stratified Ontology

Hi Paul,

I do not understand fully graph theory and visualization techniques, though as Dr. Sowa draws on Pierce for his linguistic conception of semantics, we are in agreement.

On this <a r b> issue. Tom and I found it prevalent at lower level (sub-atomic) structures, though Tom believes it is not as significant as the n-ary flexibility required of any significant conceptual structure.

The architecture and functional space, the representation, of an entire building is not defined by the interlocking structure of the steel girders and/or bricks, though it is easily constrained by them, to say the least.

In our conception, the relator is a location. And we (Readware) were certainly defining (guessing at) the relators.

We also wrote a search algorithm to parse triples from natural terms. We had pretty sophisticated morphological rules that gave us a good guess at a 'singular unity' (an occurrence on which we could obtain a reliable measure of fidelity).

plant = pln

transplant = tpl

replica = rpl

replacement = rpl

We can treat each of these occurrences of terms as (possibly) related (similar) even though they differ in size and letters to a great extent.

The ways triples relate two categories is defined by their locations in the LS matrix. They take the enumerated value defined by the method of the patent (obtained by axiom). This is part of the intellectual property that we define in our 1989 patent. More can be discussed about this as we move forward in our special project.

In tests of relatedness, if we did not see levels of synonymous relations, we knew we had the wrong guess, and we re-analyzed the afflicted root form. But still, the guesses often failed in certain predictable ways. We built expectation rules; even that was not enough. It just was not enough to capture and resolve all the available evidence and thereby not fine or precise enough for most retrieval tasks. Statistics on distributions of letter combinations did not shed any light on the issue. The model produced non-mathematical errors where constructs like trains and trends score a high --maybe warranted--though unwanted-- fidelity (relevance) in a retrieval situation.

In the end, we felt that we needed to "know" the origin of the terms to solve the confusion (caused by language change) and to resolve and restore the (real, usual, stable, regular, coherent) root connections for trains and for trends. We turned to the origins of modern languages. We stopped looking where we reached classical Arabic.

Believe it or not, the ancient Arabs not only had a root form suitable for a representation of a real-life train, there is a 4000 year old root for the modern term 'atom'. In addition, the Arabic language has the most perfectly defined tri-literal root structure known to any man of any persuasion in any recallable time. No learned and reasonable personage would disagree. Not being a native Arabic speaker, I was initially quite struck by this discovery.

Now that we had a perfect linguistic triple that is certainly representative of (natural) realities, we decided we were in a position to test the theory that any word can be represented as a formula mapping two categories of concepts to one another in a certain (deducible) way.

We selected 2000 three-letter roots initially and later reformed our criteria and axioms and choose nearly 2000 more. We computed values for each relation formed between any two known representatives (in the lexicon).

That value was a variable. It was dependent upon the representative terms showing up in the window of measurement. The proposition was posed to obtain a measure of fidelity that either meets or exceeds a given tolerance (of relatedness). We ran this measure over all possible combinations of the roots/terms (combinations of known terms relative to 4000 X 4000 root-concepts) to form the (lookup) values (of (possible) substructural relatedness) in the Readware concept-base. This gave us measurable structural links between pairs of representations.

As a methodology we where looking for substructural links and ways to make sense of structural links. If we found a reasonable structural link we extended the search to include the potential of the new substructural links. In this way we were develop an empirical data set about the linkage at the two levels, substructural and structural. The structural linkage needed to have some type of meaning that is recognized by someone who knows the language. Perhaps you would call this a reification methodology. I have looked briefly at your chapters on Mill’s logic and Russian quasi-axiomatic theory, and see perhaps a way to completely formalize what we did.

With a formalization of our two level structural analysis, semantics being a third level (as you point out in yoru tri-level architecture); we are perhaps able to apply this methodology to the understanding of the language used by a small group like a commune,

Law and Court have a number of independent “substructural” relations. Some of which have nothing to do with the concept of legality or trials or lawsuits, yet they can combine in the concept of a Courthouse with other representations and concepts such as Judge and trial and the legal disposition of a case. Love is a concept. TLC is a concept that is assigned to correspond to the concept of Love by the Readware ConceptBase. This is not dissimilar to the way people create a correspondence of mind set, or a worldview about such things. I am a Peircean in that I believe nature gives us the concepts to work with and understand. We do not need to invent artificial concepts.

Now we take those natural concepts represented by terms of our vocabulary and build them into messages to deliver or to request information and for other purposes.

Consider:

Now is the time for all good men to come to the aid of their country.

What does it mean? What does it entail? Is there a simple objective answer?

If you type the phrase into Readware (interface at the Bead Games) and press the Classify button, it shows you a kind of "theorematic reasoning" (in Pierceien terms) that does not appear directly from the words or terms used in the message. It reports that the message belongs to the "Accident/Crisis" category and that it also includes the topic of "Emergency".

It (the Readware algorithm) gets the results from linkages (entailments) that... form or occur... between terms in the data and a fixed set of regularly recurring concepts found in the conceptbase and in active memory-based topical (plain language) specifications of what a crisis or emergency (usually) entails.

We can write a topic specification that states that a "Emergency" is the chief indexical of a topical state of affairs where the delivery of aid occurs (not exclusively). And we can be categorically specific that an Accident/crisis is often characterized by a state of emergency.

Of significance is that we can do this without having to specify the nature of the statements in which the entailments might occur and combine to deliver a meaningful message.

So, in summary, it is this super-structure of 4000 regular concepts and the n-ary terms, topics, categories or filters that map wholly or partially upon them, that we want to exploit, using the pre computed values for those relations stored in the conceptbase (and the binary index and the data signature) to be stored directly in the Hilbert engine.

Ken