We first start with an explanation of the (type:value) pair (based on our response to observations by John Sowa (2003) about the ATS patents).
(type:value) pairs have been used and implemented in various systems since the 1950s, and they are part of almost every major programming language and knowledge representation system in use today:
1. They are the basis for the LISP property lists in the 1950s.
2. They are the slot-filler scheme in every frame-based knowledge representation system.
3. They are the basis for the data structures in COBOL, PL/I, Pascal, C, C++, Ada, Java, etc., etc., etc.
4. They are the representation in the concept nodes of conceptual graphs
(which were first published in 1976).
Given the enormous number of
variations in which (type:value) pairs have been used, it would be very hard to
find some new patentable combination that hasn't been published many decades
ago.
In summary: the (type:value) pair has been essential to the development of computer science based on cognitive models and logics. Sowa’s observation, above, allows an important insight into the new uses of computer science that develops ONLY when one adopts a semiotics and/or differential ontology framework. The key to these approaches is stratification, in the sense that we will define. Sowa’s observation sets up a surprise that is disclosed in reading the two ATS patents.
The issue of paradigmatic context: We anticipate a rigorous scientific debate over the issues that we are raising, and are prepared for this debate.
For “us” the (type:value) pair has properties
that are NOT anticipated by classical computer science / cognitive models. It is these properties that we judge to be
the foundation of the knowledge technologies.
Scientists who have revealed an alternative,
to classical computer science / cognitive models, bring a specific knowledge
about the structure of human cognition.
This alternative is extensive and deep, in terms of interdisciplinary
contributions, and rich in history.
We see a stratification of biological
processes acting at different time scales.
This stratification is not merely a property of living systems, but is
in fact a property of any physical system.
The natural sciences inform us about the correctness of this new
paradigm. However, this science is not
classical, reductionist science.
An extensive literature exists. From this knowledge of human cognitive processes,
and the underlying metabolic and matter-energy reactions, we have derived a new
view about what is and is not possible with computation and clever data
structures. The surprise of the ATS
patents, and related patents, is seen in this light. The justification of the long-term viability of this work is made
within the natural science paradigm we are referring to.
On the innovation, its disclosure and its
adoption within the marketplace: A quick reading of the two
(1994,1996) ATS patents accounts for why there is a surprise in the specific
way in which the patent applicants disclosed this invention. Clearly the patent officers felt that the
CCM (Contiguous Connection Model) construction, and how it was used, was NOT
anticipated by the enormous amount of work that John Sowa makes a scholarly
reference to.
These different impressions, by the patent
officers and John Sowa, deserve an explanation.
The (type:value) pair is in fact a good way
to localize information about type and value.
In this way, the CCM constructions follows XML and ontologies developed
from objects and classes (like OWL).
But there is something in addition to the (type:value) pairing, and this
has to do with information organization and inference.
Even though the IT markets are now consumed with a adoption of XML and OWL type ontology, there is the certain knowledge that these are small improvements. In both XML and in the Cycorp technologies, a serous problem exists in finding the proper scope, namespaces and situational context. This “reification” process is not the only problem that one finds with XML and Cycorp constructions. To be clear, we agree that real value exists in the mainstream work, but the full promise that is seen in marketing material is missing something.
One imagines that if there is something missing, as we conjecture there is, one might begin to see this something in many patents, and in research papers. We point to connectionism as one indication of the nature of a missing component to existing information technologies. The claim is not that connectionism supplies all of the answers, but that connectionism exists because there is more required that a localization of information into (type:value) pairs.
One may also understand, or believe, that the structure of any natural “system’s” expression is so constrained in the real world that the number of types and the relationships between types are small in number, and yet open to change. This is partially what the literature on semiotics is focused on. Seen in this way, one finds data regularity in context as a matter of human observation. This regularity can be seen, partially with various techniques that are being integrated into the Knowledge Sharing Core.
Localization of information has only a relative value, as has been shown by experience with systems like Protégé and Cycorp ontologies.
Localization and holonomic processes: Given the current methods, individual localizations can be massive in number. The current architectures develop problems in completeness and consistency (the micro-theory problem in Cycorp and the scope problem in Topic Maps). One has to be able to organize a reasonable number of elements, each having the (type:value) pair nature, into situational and scoped constructions.
Our science provides insights and methods for organizing coherent viewpoints that are specific to an inquiry or inference. Specifically we look to several innovations that have been adopted as part of SchemaServer developed by SchemaLogic Inc. SchemaServer provides both a data schema integration process and a community based reconciliation process that works on expressing structural ambiguities necessary to human dialog and interaction. But the reconciliation of controlled vocabularies and database schemas is only the very beginning of the capabilities we feel capable of delivering within a few months.
Schema resolution is seen as both a discrete process, involving logics over schema, as well as a continuous process, involving techniques like latent semantic indexing and associative memories. Differential ontology is a formal mapping methodology between the discrete (and explicit) ontology and the implicit (continuum mathematics expressed) ontology.
More has to be said on differential and formative ontology, but for now we should return to the discussion of the ATS patents. Are these patents a reduction to practice of both the (type:value) pair AND connectionist theory? The answer is “yes”.
Specifically, an “global” organizational process is illustrated by the “inversion” technique disclosed in the ATS patent.
Applied Technical Systems (ATS) is a small company that has been in existence for almost 20 years. It has been developing the CCM-Powered referential system with the hopeful knowledge that CCM-Powered systems could become a ubiquitous information and knowledge sharing technology - sitting at the heart of a cultural / economic knowledge revolution. Part of the early goals of the Phase 1 is to demonstrate why this is a reasonable hope. We also expect to build certain other technologies, based on other patents, on the fundamental data structures that now exist in CCM-Powered NdCore ontology development system.
A single innovation will not ignite the Semantic Web, as we see with the limited use of OWL and RDF. However, the development of a method that finds and discloses innovations that can be build on the CCM constructions will fundamentally change what can be expected in the near term. The time is right for this revolution to occur.
The open questions can be identified by
science and lead to additional inventions and disclosures. A number of these open questions are being
addressed in the near term. One may
address the question of nearness, similarity and complexity. Deductive inference, using first order
predicate logics, makes little sense in domains with high measures of
irregularly and novelty. So one can,
and should, make a distinction between deductive logics, which can be computed
by computers; and inductive inference, which is a cognitive process that is not
completely understood by natural science, as yet.
Automation of the construction of recursive
grammars: The SAIC/OntologyStream team includes also a
small company, Text Analysis International Corporation (TAIC). A patent pending Integrated Development
Environment (IDE) for developing text analyzers has been evaluated in
preliminary work by OntologyStream scientists.
The TAIC patent application allows knowledgeable users to develop a
flexible multi-pass construction process that produces a highly situational set
of parsing rules. Passes are involved
in tokenizing, morphological analysis, spelling correction, parts-of-speech
tagging, entity recognition, simple extraction (names, titles, locations,
dates, quantities), and constituent recognition (noun phrases, passages,
themes).
In the IDE, these passes are not black boxes,
as is typical to deployed NLP, or ontology constructor, systems, but are open
to rapid modification by a knowledgeable user.
The modifications are expressed in the open construction of atoms in a
situational logic and can be rendered as taxonomy or ontology. The atoms themselves are “recognized” by the
IDE and users are allowed to instantiate those atoms that are deemed
important. Moreover, an additional
invention (not as yet disclosed) convolves the ATS patents with the TAIC patent
application to produce a general-purpose ontology constructor.
Given such a flexible arrangement, one can
organize an NLP, or ontology constructor, system in the best possible way for
any given application. Furthermore, the
ability to insert passes into an existing set of passes enables a system to
grow, or be reduced, in a flexible and modular fashion. For example, some passes can be devoted
entirely to syntax, others to lexical process such as segmenting text into
lines, or a complex subsystem such as a recursive grammar for handling
lists. These flexible arrangements may
be applied to web harvesting to produce a competitor to the current J-39
Harvester now deployed at INSCOM. A
large number of existing patents can be studied by implementing any specific
patent using the IDE and then adjusting other processes so that limitations now
seen in these patent implementations can be addressed.
One must recognize that the invention of a
new computer based algorithm and its implementation into computer code are two
very different processes. Moreover,
each invention has to be set in the context of other inventions if all of the
issues that scholars are fully aware of, are to be addressed in a single
unified capability.
The Semio patents, now owned by Entrieva Inc,
will be examined and extended (within a side agreement with Entrieva), so that
the already “best in market” results of the Entrieva conceptual maps
application will be improved and made domain specific. A test collection using a small number of
short fables has been studied as part of preliminary privately funded research
– (made 1997 – 2003). In this case,
with the Semio patents, we feel that a discovery of fundamental importance was
disclosed in Semio Founder, Claude Vogel’s patents. The specific discovery assists in the definition of concept
expression and the extraction of passage categories having similar
meanings.
ClearForest tools are being widely deployed
(as of July 2003) as rule based entity extraction systems looking for themes in
web published text. ClearForest is part
of the SAIC/OntologyStream team because their ClearResearch toolset is
compatible and complementary to the TAIC IDE.
ClearForest tools are to be used as a measurement of the social
discourse, and then other tools are used to develop a weather map-type
representation of the thematic structure of social discourse being expressed in
public web sites by various social units – including those social units that
represent possible asymmetric threats to the public well being. Both of these tools will exist in a usable
form within the Knowledge Sharing Core.
Our science advisors include Drs Karl Pribram
(cognitive neuroscience), Raymond Bradley (theory of social systems), Peter
Kugler (perceptual measurement), Robert Shaw (ecological psychology), Daniel
Levine (connectionism), John Sowa (cognitive graphs), and Steven Newcomb (topic
maps). These scholars’ participation
forms the core resource for the Ontology Stream science advisory committee. Our activity is funded by several
sources.
ClearForest and TAIC IDE environments allow
the expression of patents that have been awarded by the US Patent and Trademark
Office. These environments also allow
the expression of inventions that are being considered as patent
applications. Our scholars are in a
position to assist in the proper expression of these inventions, both in the
form of a common language and in the form of patents filed on behalf of the inventor.
The common patent expression language that we
have chosen is called Cubicon. A more
complete discussion of Cubicon, and its history, will have to be developed
during Phase 1. Our Phase 1 budget
includes an expenditure of 180 K for interactions with Sandy Klausner, founder
of CoreTalk Inc and inventor of the Cubicon language, and for conferences with
scholars on this issue of a common description/deployment language for complex
programming. Macromedia presentations
are available, from CoreTalk Inc., which demonstrates the principles of
Cubicon.
Mapping Intellectual Property: The OntologyStream is working on mapping emerging intellectual property and in helping those that we select bring innovation into a common expressive environment. Our evaluation of the two ATS patents, and the implementation work, has lead us to place the CCM constructions and processes at the ground level of the Knowledge Sharing Core.
Those associated with OntologyStream propose a specific project that starts in January 2004 and lasts until June 2006 (18 months) as Phase 1. At the end of Phase 1, we will propose to shift our attention to the use of a deployed and tested system to demonstrate high fidelity general-purpose analysis of social discourse occurring in real time in several languages, one of these being Arabic. We are aggressive in developing a powerful new utility for use in concluding the War on Terrorism. We understand that the use of this utility is targeted at providing safety for the American public, while being consistent to Constitutional processes.
Preliminary work (6 months ending in October) has been as part of an internal ATS/OntologyStream R&D project. The governing SOW is specific in targeting the development of an experimental system to explore CCM constructions and processes that might occur on these constructions. Our previous work has been on a design to improve on the CCM results as expressed in the NdCore system, by adding ontology and linguistic services to the current process. However, this work is consistent with the notions of the Knowledge Sharing Core. The NdCore is a system of thematic analysis being developed by ATS. The NdCore creates an emerging ontology that depends on the text analyzed and the variation of inputs by the users. Several other previous projects, mostly private efforts, go back into the early 1990s.
Knowledge Sharing Core concept: The Knowledge Sharing Core (KSC) concept is fully completely expressed at:
http://www.bcngroup.org/area2/KSF/KSFArchitecture.htm
KSC addresses the need to make a transition
in how information technology innovation is being evaluated and procured by the
federal government. The KSC does an end
run around the existing technology evaluation and procurement process. For SAIC this is no problem as the
management understands that the need for this transition is recognize by
military and intelligence clients.
SAIC management will not advise what to select into the KSC as this will be a process that is governed by scientists who become part of an advisory board to OntologyStream Inc. and or the BCNGroup.org. It will always be clear that it is the scientists and not the business people who make these selections.
The team proposal to DARPA (Phase 1)
The 18-month budget is 1720K (12 months) plus 960K (6 months) to stay within a total of 4 M over five years
Applied Technical Systems 425 K 75 K
OntologyStream 400 K 200K
Text Analysis International 225 K 25 K
SchemaLogic 125 K 25 K
CoreTalk 120 K 60 K
Entrieva 75 K 25 K
ClearForest 50 K 25 K
Groove 10 K 0 K
SAIC 25 K 0 K
subtotal 1455 K 460 K
Project management (SAIC) 265 K 135 K