National Knowledge Project

Friday, June 18, 2004

On how data might be encoded

Friday, June 18, 2004

Communication to: Richard Ballard, founder Knowledge Foundations Inc

CC: BCNGroup and colleagues

KMCI and I had a long disagreement (1999- 2002) over whether induction is reducible to deduction, with Joe Firestone and several of his friends holding that induction was reducible to deduction. Well, the problem was more difficult than that, and had to do with Joe’s idea that there is "one truth" which could be established by rational argument. Inductive steps, such as those made by Cantor, sometimes add elements of formalism that go unaccounted for long periods. Why? It is because the step is useful in some way. We need the formal notion of infinity clarified for example.

http://www.ontologystream.com/IRRTest/Evaluation/ARLReport.htm

The issue of practical value is present as a control over what can be said. Such is social life.

In any case, there has always been a great difference between my work and the positions at KMCI. We are all on a learning path. I have learned a great deal over the last five years about what not to do in regards to e-forums. I have also learned that one must not argue with fools, even well educated ones.

Richard's arguments ring very true to me in many ways. First one has to look at things as they could be, and Richard does this, leading us into a use of computer that takes advantage of the current hardware features. I will give an example in a few more paragraphs.

The task that we are proposing to do for NASA is to convert all Earth observational data into n-aries, starting most with what the Russian semioticians called syntagmatic units of 2-aries, written < a, r, b >. The use of this form is so that I can re-introduce semiotics in a way similar to how Pospelov and Osipov saw this. The re-introduction leads to formative and differential ontology.

For the draft discussion see:

http://www.bcngroup.org/beadgames/InOrb/theoryOfInformation.htm

Richard's Mark 3 is most ideal for doing the work that we are proposing to NASA (July 6th submission), and the work that we are contemplating leads naturally into Mark 3 encoding. If we are funded, and we are not asking for much money to start with, then Mark 3 will be in the future path of our project. Mark 3 technology, or something similar, is where all of this is going.

However, there is a spectrum of technologies that will be, or is, the foundation for the knowledge technologies.

Here is where Richard and I have some problems that neither of us fully understands. I want to see the neuroscience, linguistics, physics, etc represented in the theory of information. I want the theory to be one that talks about memory and anticipatory computational mechanisms being artificially separated during “learning” and then entangled via convolutions at real time. We call this the Anticipatory Web (of information).

Mark 3 has these elements also, but the conceptualization is held differently in my mind than the conceptualization is held in Richard’s mind. It is not so easy to just talk about, and the community of knowledge scientists has, as yet, never been able to actually break free from the rules of the status quo created by current generation computer science and the commercialization of computer science as information technology.

The National Project would address issue of community building directly.

Ok, so to the question that Alex Citkin asked Pile Systems regarding:

The idea to marry hierarchical and associative structures is very appealing, of course. But I am not very familiar with the Pile theory. Perhaps this is why I do not completely understand certain things. Could someone, please, explain to me on a purely practical level how I can convert let say list of 20 million records containing First Name, Last Name, Address into the pile in order to simplify search, get rid of indexes, why the result will be error resistant and compressed?

This question is important. I have one answer for two technologies,

1) RIBs (Referential Information Bases), also called key-less hash tables, and

2) just the normal hash table.

That answer violates the notion of normal form where data redundancy is avoided at the cost of having to create indexes.

What we get in return is fast retrieval ideally instrumented for convolutional theory, and we get fractal compression.

Mark 3 gets the same thing, as does (I think) the Pile System encoding.

So we take the 10 million records and each record is represented as three triples:

< first name, relationship 1, second name>

<address, relationship 3, first name >

We now have all data “accessable”, if one knows what this structure is.

It is then necessary to create three hash tables, one for first name, second name, and address.

When these are ordered by interpretation of the ACSII string as a base 64 number, or something similar to this, then the hash tables become key-less and no empty containers are needed (as they are in the normal hash table). In each case, the hash bucket contains either the associated information or some pointers. This is discussed in the Orb encoding section of the Notational Paper.

Empirical studies have to be made to demonstrate that this new data-encoding concept outperforms the relational database. Many patents have been filed and awarded on related pieces of this post-relational database theory. One of the objectives of the NASA project, and certainly of the National Project, if funded will be to make sense of the confusion and to make public domain the basic concepts. My hoped for appointment somewhere in the university system is designed to allow a distance learning environment to be instantiated for this purpose.

How folks like myself and Richard will be compensated for long years of effort is a separate question, which can only be solved either by some deep pocket investments or by political will, as demonstrated by funding the 60 M line item for the first two years of the National Project.