Developing the Optimal Text Retrieval (OTR) engine

Architectural discussion II

2:55 PM

Software

On the precise definition of what a concept is

On the separation of client – server

Purpose of the Optimal Text Retrieval (OTR) engine

Software

Everyone can download a Readware browser from

ftp://www.readware.com/Software/Pi.exe

and do some testing. Actually one needs to be able to develop a technical and tacit understanding of how and what goes on in obtaining results from the four options:

1) locate related concepts in context

2) locate verbatim items as specified

3) locate documents with all query items

4) locate articles about this person

On the precise definition of what a concept is

In order to obtain a precise understanding of what, for example, a “concept” is, we need to look more deeply. However, it is noted that a general criticism of conceptual roll up and conceptual indexing is that the technical discussion is often a bit artificial.

So we need to lay out the difficulty of “locating a concept” via a precise process. If one says, “locate the concept of beauty” in the Aesop fables, and one is a scholar on the fables, this is still an art form. It has the nature of proving a theorem in mathematics. One might suggest that this is a precise thing, but often there are things about what is done that involves incompleteness and leaps of faith.

Sowa, and his colleagues, have addressed these issues with discussion of both structure mapping and conceptual construction functions. So questions about the identification of indicators, that concepts are “in the text”, is conditioned with an active role that the perceiver has in the cognitive act of reading. I am supposing that a language expressing details of structure mapping and conceptual construction functions will become agreed on as we move down the path we have chosen.

How are deep “philosophical” issues avoided in the context of the evaluation of the quality of “conceptual indexing”?

The language that is used quickly makes us say things that are not in fact proper, when thought about in the context of human cognitive and behavioral acts. We adopt a type of slang, and most who are involved in the development of software are not aware of what precisely the slang is leaving out of the discussion.

Ok so one has to say, now and then, that the nature of the terms we use to refer to concepts are often not precisely align with a specific algorithmic process run over specific data structures.

On the separation of client – server

Obviously, one can deploy “indexing” and retrieval technology in various ways. By “indexing” one may mean the process of creating b-trees for optimizing retrieval mechanisms. By “indexing” I will always mean a “conceptual index” where the elements of the conceptual index are “pointing” at subject matter indicators. In the Hilbert encoding there are no “database” indices, and the concept representations are often encoded into data structures with one part of the data structure “protruding” as a discrete “point” on a Hilbert line.

Now, we can get into what I just said. But instead, let us ask why I said what I said. Answer: the Hilbert encoding has a notational holonomy to the way that data is placed into a persistent piece of physical memory in the computer. This is the reason why the search has zero complexity. One can go directly to the data, no matter what the data is. One needs only to understand that the data and its location are the same thing.

This means that one can carry Orb constructions in very compact maps, like bitmaps, and manipulate the constructions using very few fetch-execute cycles. Bjorn’s use of Forth facilities this.

The software browser that one can download from Readware is a thin client and is slow for several reasons:

1) the data encoding is not optimal in terms of speed and memory management

2) one has a lot of server and client software that is built and then used when the software is used

A stand alone executable can be developed that encodes data into Orb constructions and uses these constructions locally to development information about a query. One can then draw in the query results after the client has completed its work.

Now, I am sure that there are details that need to be discussed here. So we can start that discussion now.

Purpose of the Optimal Text Retrieval (OTR) engine

Retrieval can be by keyword, and yet this often leaves out relevant items and adds non-relevant items. The literature talks about this as a precision – recall trade off, and we will let the literature stand for itself. Many feel that the notion of precision – recall outcome metrics is a little artificial. Clearly, with precision-recall measurement, one defines how this is done as a first step.

The need is for the addition of services that constrain the search retrieval and routing tasks. It is true that one can post a URL with three words randomly selected so that these words have never occurred in text before, and get this URL, immediately on a Goggle search the next day. So why does one need a faster retrieval engine?

The answer has to do with the difficulties that have been experienced with machine encoding of the semantics of human readable text.

The purpose of the OTR engine is to add knowledge of language morphology and phonetics. The conjecture is that this knowledge allows conceptual indexing to “see” the glue that ties together subject indicators and to thus provide indexical sensitivity to patterns of language-use that are pointing at the “same” concepts.