National Knowledge Project

[367] home [368]

Saturday, January 28, 2006

The BCNGroup Beadgames

Challenge Problem à

[148] ß [parallel discussion on generative methodology (Judith Rosen)

[147] ß [parallel discussion on generative methodology (Peter Krieg)

[368] ß [comment on four issues (Richard Ballard)

Additional comment and history

Back to [368]

Communication from Paul Werbos à [367]

Communication from Dr Richard Ballard (founder, Knowledge Foundations Inc)

Continuing . . from [368]

The OWL standard is as far as I can tell the same as the Oracle/Rational Standard. That is to say it is all about never "instancing relationships" -- always making certain that relationships are defined by end points alone. This is a requirement of the "relational database" physical model and one of several rules aimed at guaranteeing that database modeling never gets near the quadratic indexing explosions.

OWL standard is about preserving the same software capabilities and limits the industry has already, hence "relational" and "SQL".

Starting with Mark 1 we were asked by Atari to create a "knowledge shell" completely independent of content. They wanted a database that would never need any a' priori schema. One database to handle all models, one set of file formats that would never change -- no matter the content. They knew the high cost of acquiring knowledge and sought to use this one to capture any fiction their game writers might care to invent. Goodbye knowledge cost.

The standard "Semantic Conceptual Modeling" everyone starts with is the obvious answer. It has concepts and relationships -- so we started there taking dBase 3 plus (then -- 1984) "now ODBC" as our platform.

Create a REL file with 3 fields [Rel#, C1#, C2#], that one file can hold all possible binary relationships -- forever. The Rel# key can have a type only (Rel) identifier or it can be (Rel#) instanced (i.e. by concatenating Rel Type to a unique # Instance enumerator). In our case, we ultimately chose to instance all keys by Type and enumeratior. Hence our two-part, universal Model-Instance codes.

To find any binary relationship, you can search for its Rel# (Rel-Instance), if you know that. If you do not know the instance, but do know the concept-instance and relationship type (typical database search assumption), You form an index by concatinating "RelType" + "C1Model-Instance#", we call it RX1 -- letting you access the head of the arrow to find the tail. A second index RX2 concatenates "RelType" + "C2Model-Instance#", searching for the tail to find the head.

Clearly if you instance relationships (Rel#) you can be there with one index and jump. If you access via end points you must search both RX1 and RX2. Still one binary relationship file plus 1 or 2 indexes for all possible content was easy -- why require Codd's hundreds of predefined custom tables?

Who needs a further logical model or physical model a la Codd to screw things up?? Mark 2 finally ended up needing only 9 file types and 14 indexes -- precisely the same file formats, builders, and browsers were used for all of its 50 some projects of national importance -- unchanged for almost 20 years. Still their content could be changed constantly usually updated or revised substantially in less than 2 weeks.

What about "triples" (Mark 2)? Same game --

Now a single triad (triple) file needs only 4 fields [TRI#, C1#, C2#, C3#].

To fetch a triple given a triple instance TRI#, needs one index. Given all possible end and/or middle points, it takes two points to find the third, hence it need 6 indexes.

TX1 (1+2), TX2(2+3), TX3(3+1), TX4(2+1), TX5(3+2), TX6(1+3)

>From the perspective of triples. the database starts to break and Oracle and Rational with their UML language invented modeling rules to never let modelers ever see or try this. Those rules make certain no one ever sees a database break or succeed outside their natural limits. Every index has a file size approximating the file it indexes. Two indexes may double the size of the original table need, but 6 indexes starts it on a quadratic byte eating explosion that goes grows like N x N-1 -- if they ever let you do it.

Mark 2 had such an advantage (over Codd) with just one file for all binaries that it added one for all triples and decided to eat the "forbidden" quadratic indexing cost. We found then the the real demand for triples far exceeded that for binaries and so we sought to find a general solution (proportional to information content) for N of any size. Knowledge Theory gave us that solution by 1994. That unique Mark 3 solution makes only a direct jump to any relationship, independent of N, with no indexes or search of any kind. Its working here now as we speak.

The bottom line is that the relational database with its Codd model is a piece of junk. Douglas McDavid was inside of IBM, when Codd hatched it, working with another group that was pursuing the pure conceptual model approach I took with dBase. Codd won the battle there politically with greater aggressiveness and McDavid's IBM group leader played good soldier and stepped aside.

Years later Douglas and I were conspiring within your previous categorical Abstraction letter group to use Mark 2 at IBM World Services to model all of their client profiles, but he was transferred to San Jose Research and told by IBM brass that IBM would have to reinvent it again themselves. They did not go forward so far as I know, though I thought they bought Rational just to kill off Codd, before going semantic. Has anyone seen Douglas?

Codd started with his scheme to make certain that all concept fields were contained within just one single contiguous record. That alone made predefined table definition a necessity and lead directly to all the other quadratic complexity nonsense of many custom tables, end point searches, indexes, ordnality. Engineers call it today a classic case of "sub-system optimization" leading to "total system de-optimization."

What about "SQL"?

SQL is just a band-aid to extend slightly the schema flexibility when getting data models from several sources. It offers relational table merges (matrix inner products). If you need a table with row by column = X by Z and someone else has X by Y and Y by Z, then SQL gives you a way to do X by Y times Y by Z and get X by Z. SQL is irrelevant to anything not fitting Codd's array type schema assumptions -- things like "trees" are an historical case on point -- SQL would have to treat an ontological tree as a special custom type -- i.e. offer no help.

So the bottom bottom line is that for the sake of convention Codd's model has survived and for the sake of convention W3C wants to make its gross deficiencies and explosive complexities a permanent part and limitation on the future ontological standard.

Think of Mark 3 as a wooden stake or a silver bullet, but whatever you need just kill it.

Not shyly, think of that is our plan and purpose.

Back to [368]