(Home)

 

Technical Section and Team/Budget description

 

We first start with an explanation of the (type:value) pair (based on our response to observations by John Sowa (2003) about the ATS patents).

 

(type:value) pairs have been used and implemented in various systems since the 1950s, and they are part of almost every major programming language and knowledge representation system in use today:

1. They are the basis for the LISP property lists in the 1950s.

2. They are the slot-filler scheme in every frame-based knowledge representation system.

3. They are the basis for the data structures in COBOL, PL/I, Pascal, C, C++, Ada, Java, etc., etc., etc.

4. They are the representation in the concept nodes of conceptual graphs (which were first published in 1976).

 

Given the enormous number of variations in which (type:value) pairs have been used, it would be very hard to find some new patentable combination that hasn't been published many decades ago.

 

In summary: the (type:value) pair has been essential to the development of computer science based on cognitive models and logics.  Sowa’s observation, above, allows an important insight into the new uses of computer science that develops ONLY when one adopts a semiotics and/or differential ontology framework.  The key to these approaches is stratification, in the sense that we will define.  Sowa’s observation sets up a surprise that is disclosed in reading the two ATS patents. 

 

The issue of paradigmatic context:  We anticipate a rigorous scientific debate over the issues that we are raising, and are prepared for this debate. 

 

For “us” the (type:value) pair has properties that are NOT anticipated by classical computer science / cognitive models.  It is these properties that we judge to be the foundation of the knowledge technologies. 

 

Scientists who have revealed an alternative, to classical computer science / cognitive models, bring a specific knowledge about the structure of human cognition.  This alternative is extensive and deep, in terms of interdisciplinary contributions, and rich in history.

 

We see a stratification of biological processes acting at different time scales.  This stratification is not merely a property of living systems, but is in fact a property of any physical system.  The natural sciences inform us about the correctness of this new paradigm.  However, this science is not classical, reductionist science.

 

An extensive literature exists.  From this knowledge of human cognitive processes, and the underlying metabolic and matter-energy reactions, we have derived a new view about what is and is not possible with computation and clever data structures.  The surprise of the ATS patents, and related patents, is seen in this light.  The justification of the long-term viability of this work is made within the natural science paradigm we are referring to.

 

On the innovation, its disclosure and its adoption within the marketplace:  A quick reading of the two (1994,1996) ATS patents accounts for why there is a surprise in the specific way in which the patent applicants disclosed this invention.  Clearly the patent officers felt that the CCM (Contiguous Connection Model) construction, and how it was used, was NOT anticipated by the enormous amount of work that John Sowa makes a scholarly reference to. 

 

These different impressions, by the patent officers and John Sowa, deserve an explanation. 

 

The (type:value) pair is in fact a good way to localize information about type and value.  In this way, the CCM constructions follows XML and ontologies developed from objects and classes (like OWL).  But there is something in addition to the (type:value) pairing, and this has to do with information organization and inference.

 

Even though the IT markets are now consumed with a adoption of XML and OWL type ontology, there is the certain knowledge that these are small improvements.  In both XML and in the Cycorp technologies, a serous problem exists in finding the proper scope, namespaces and situational context.  This “reification” process is not the only problem that one finds with XML and Cycorp constructions.  To be clear, we agree that real value exists in the mainstream work, but the full promise that is seen in marketing material is missing something. 

 

One imagines that if there is something missing, as we conjecture there is, one might begin to see this something in many patents, and in research papers.  We point to connectionism as one indication of the nature of a missing component to existing information technologies.  The claim is not that connectionism supplies all of the answers, but that connectionism exists because there is more required that a localization of information into (type:value) pairs. 

 

One may also understand, or believe, that the structure of any natural “system’s” expression is so constrained in the real world that the number of types and the relationships between types are small in number, and yet open to change.  This is partially what the literature on semiotics is focused on.  Seen in this way, one finds data regularity in context as a matter of human observation.  This regularity can be seen, partially with various techniques that are being integrated into the Knowledge Sharing Core. 

 

Localization of information has only a relative value, as has been shown by experience with systems like Protégé and Cycorp ontologies. 

 

Localization and holonomic processes:  Given the current methods, individual localizations can be massive in number.  The current architectures develop problems in completeness and consistency (the micro-theory problem in Cycorp and the scope problem in Topic Maps).  One has to be able to organize a reasonable number of elements, each having the (type:value) pair nature, into situational and scoped constructions. 

 

Our science provides insights and methods for organizing coherent viewpoints that are specific to an inquiry or inference.  Specifically we look to several innovations that have been adopted as part of SchemaServer developed by SchemaLogic Inc.  SchemaServer provides both a data schema integration process and a community based reconciliation process that works on expressing structural ambiguities necessary to human dialog and interaction.  But the reconciliation of controlled vocabularies and database schemas is only the very beginning of the capabilities we feel capable of delivering within a few months.

 

Schema resolution is seen as both a discrete process, involving logics over schema, as well as a continuous process, involving techniques like latent semantic indexing and associative memories.  Differential ontology is a formal mapping methodology between the discrete (and explicit) ontology and the implicit (continuum mathematics expressed) ontology. 

 

More has to be said on differential and formative ontology, but for now we should return to the discussion of the ATS patents.  Are these patents a reduction to practice of both the (type:value) pair AND connectionist theory?  The answer is “yes”.

 

Specifically, an “global” organizational process is illustrated by the “inversion” technique disclosed in the ATS patent.  

 

Applied Technical Systems (ATS) is a small company that has been in existence for almost 20 years.  It has been developing the CCM-Powered referential system with the hopeful knowledge that CCM-Powered systems could become a ubiquitous information and knowledge sharing technology - sitting at the heart of a cultural / economic knowledge revolution.  Part of the early goals of the Phase 1 is to demonstrate why this is a reasonable hope.  We also expect to build certain other technologies, based on other patents, on the fundamental data structures that now exist in CCM-Powered NdCore ontology development system. 

 

A single innovation will not ignite the Semantic Web, as we see with the limited use of OWL and RDF.  However, the development of a method that finds and discloses innovations that can be build on the CCM constructions will fundamentally change what can be expected in the near term.  The time is right for this revolution to occur.

 

The open questions can be identified by science and lead to additional inventions and disclosures.  A number of these open questions are being addressed in the near term.  One may address the question of nearness, similarity and complexity.  Deductive inference, using first order predicate logics, makes little sense in domains with high measures of irregularly and novelty.  So one can, and should, make a distinction between deductive logics, which can be computed by computers; and inductive inference, which is a cognitive process that is not completely understood by natural science, as yet. 

 

Automation of the construction of recursive grammars:  The SAIC/OntologyStream team includes also a small company, Text Analysis International Corporation (TAIC).  A patent pending Integrated Development Environment (IDE) for developing text analyzers has been evaluated in preliminary work by OntologyStream scientists.  The TAIC patent application allows knowledgeable users to develop a flexible multi-pass construction process that produces a highly situational set of parsing rules.  Passes are involved in tokenizing, morphological analysis, spelling correction, parts-of-speech tagging, entity recognition, simple extraction (names, titles, locations, dates, quantities), and constituent recognition (noun phrases, passages, themes). 

 

In the IDE, these passes are not black boxes, as is typical to deployed NLP, or ontology constructor, systems, but are open to rapid modification by a knowledgeable user.  The modifications are expressed in the open construction of atoms in a situational logic and can be rendered as taxonomy or ontology.  The atoms themselves are “recognized” by the IDE and users are allowed to instantiate those atoms that are deemed important.  Moreover, an additional invention (not as yet disclosed) convolves the ATS patents with the TAIC patent application to produce a general-purpose ontology constructor.  

 

Given such a flexible arrangement, one can organize an NLP, or ontology constructor, system in the best possible way for any given application.  Furthermore, the ability to insert passes into an existing set of passes enables a system to grow, or be reduced, in a flexible and modular fashion.  For example, some passes can be devoted entirely to syntax, others to lexical process such as segmenting text into lines, or a complex subsystem such as a recursive grammar for handling lists.  These flexible arrangements may be applied to web harvesting to produce a competitor to the current J-39 Harvester now deployed at INSCOM.  A large number of existing patents can be studied by implementing any specific patent using the IDE and then adjusting other processes so that limitations now seen in these patent implementations can be addressed. 

 

One must recognize that the invention of a new computer based algorithm and its implementation into computer code are two very different processes.  Moreover, each invention has to be set in the context of other inventions if all of the issues that scholars are fully aware of, are to be addressed in a single unified capability. 

 

The Semio patents, now owned by Entrieva Inc, will be examined and extended (within a side agreement with Entrieva), so that the already “best in market” results of the Entrieva conceptual maps application will be improved and made domain specific.  A test collection using a small number of short fables has been studied as part of preliminary privately funded research – (made 1997 – 2003).  In this case, with the Semio patents, we feel that a discovery of fundamental importance was disclosed in Semio Founder, Claude Vogel’s patents.  The specific discovery assists in the definition of concept expression and the extraction of passage categories having similar meanings. 

 

ClearForest tools are being widely deployed (as of July 2003) as rule based entity extraction systems looking for themes in web published text.  ClearForest is part of the SAIC/OntologyStream team because their ClearResearch toolset is compatible and complementary to the TAIC IDE.  ClearForest tools are to be used as a measurement of the social discourse, and then other tools are used to develop a weather map-type representation of the thematic structure of social discourse being expressed in public web sites by various social units – including those social units that represent possible asymmetric threats to the public well being.  Both of these tools will exist in a usable form within the Knowledge Sharing Core.

 

Our science advisors include Drs Karl Pribram (cognitive neuroscience), Raymond Bradley (theory of social systems), Peter Kugler (perceptual measurement), Robert Shaw (ecological psychology), Daniel Levine (connectionism), John Sowa (cognitive graphs), and Steven Newcomb (topic maps).  These scholars’ participation forms the core resource for the Ontology Stream science advisory committee.  Our activity is funded by several sources. 

 

ClearForest and TAIC IDE environments allow the expression of patents that have been awarded by the US Patent and Trademark Office.  These environments also allow the expression of inventions that are being considered as patent applications.  Our scholars are in a position to assist in the proper expression of these inventions, both in the form of a common language and in the form of patents filed on behalf of the inventor. 

 

The common patent expression language that we have chosen is called Cubicon.  A more complete discussion of Cubicon, and its history, will have to be developed during Phase 1.  Our Phase 1 budget includes an expenditure of 180 K for interactions with Sandy Klausner, founder of CoreTalk Inc and inventor of the Cubicon language, and for conferences with scholars on this issue of a common description/deployment language for complex programming.  Macromedia presentations are available, from CoreTalk Inc., which demonstrates the principles of Cubicon. 

 

Mapping Intellectual Property:  The OntologyStream is working on mapping emerging intellectual property and in helping those that we select bring innovation into a common expressive environment.  Our evaluation of the two ATS patents, and the implementation work, has lead us to place the CCM constructions and processes at the ground level of the Knowledge Sharing Core.

 

Those associated with OntologyStream propose a specific project that starts in January 2004 and lasts until June 2006 (18 months) as Phase 1.  At the end of Phase 1, we will propose to shift our attention to the use of a deployed and tested system to demonstrate high fidelity general-purpose analysis of social discourse occurring in real time in several languages, one of these being Arabic.  We are aggressive in developing a powerful new utility for use in concluding the War on Terrorism.  We understand that the use of this utility is targeted at providing safety for the American public, while being consistent to Constitutional processes. 

 

Preliminary work (6 months ending in October) has been as part of an internal ATS/OntologyStream R&D project.  The governing SOW is specific in targeting the development of an experimental system to explore CCM constructions and processes that might occur on these constructions.  Our previous work has been on a design to improve on the CCM results as expressed in the NdCore system, by adding ontology and linguistic services to the current process.  However, this work is consistent with the notions of the Knowledge Sharing Core.  The NdCore is a system of thematic analysis being developed by ATS.  The NdCore creates an emerging ontology that depends on the text analyzed and the variation of inputs by the users.  Several other previous projects, mostly private efforts, go back into the early 1990s. 

 

Knowledge Sharing Core concept: The Knowledge Sharing Core (KSC) concept is fully completely expressed at:

 

http://www.bcngroup.org/area2/KSF/KSFArchitecture.htm

 

KSC addresses the need to make a transition in how information technology innovation is being evaluated and procured by the federal government.  The KSC does an end run around the existing technology evaluation and procurement process.  For SAIC this is no problem as the management understands that the need for this transition is recognize by military and intelligence clients.

 

SAIC management will not advise what to select into the KSC as this will be a process that is governed by scientists who become part of an advisory board to OntologyStream Inc. and or the BCNGroup.org.  It will always be clear that it is the scientists and not the business people who make these selections. 

 


The team proposal to DARPA (Phase 1)

The 18-month budget is 1720K (12 months) plus 960K (6 months) to stay within a total of 4 M over five years

 

Company                                                 first 12 months        next 6 months

 

Applied Technical Systems                              425 K                   75 K

OntologyStream                                               400 K                    200K

Text Analysis International                              225 K                   25 K

SchemaLogic                                                    125 K                  25 K

CoreTalk                                                           120 K                 60 K

Entrieva                                                              75 K                   25 K

ClearForest                                                         50 K                   25 K

Groove                                                               10 K                                     0 K

 

SAIC                                                                   25 K 0 K

 

 

subtotal                                                              1455 K              460 K

 

 

Project management (SAIC)                             265 K                  135 K