NIMA Proposal:  Technical  Volume

(Synthetic Perception System for Detecting Novel Intelligence)

 

Section 1:  Innovative claims for the research 

 

1.1:  Hierarchical Taxonomy.  Our event database consists of levels of taxonomical organization.  In the simplest case, distributed instrumentation is placed in a computational space, such as an image library or a full-text database.  Perception then acts on this outside stimulus and encodes results into the event database. 

 

One application of this process is to detect intrusions by parsing log files.  A log file is sampled and transformed to a table having two or more columns.  An algorithm processes the table to produce atoms and relationship types.  Emergent computing (a type of feature extraction and categorization process) produces event compounds.  These compounds are then annotated and transferred to a knowledge base of cognitive graphs.  One is able to then form compounds of compounds and to trend event occurrences into models of cause and motivation.

 

The bottom layer of the event database is developed autonomously.  The data stream is “perceived” by the structured variation of conjectural convolutions and a cognitive graph type knowledge base helps to constrain the perception into meaningful elements – using frames with slots and fillers as an intermediate control mechanism.  Humans will be able to modify the perceptual cues easily, using voice or textual commands, and can make interpretations based on records of past experiences.  These features have been accounted for in our existing prototype software. 

 

Figure 1: Four levels of knowledge representation and the KOS

 

The synthetic perceptional system is informed by cognitive influences, in a way that is parallel to what is known about the human perceptional system.  Specifically, the architecture directly reflects the organizational stratification in human perception, and the use of convolution (integration over space and time) as a means to move categorical information from one level of organization to a higher level of organization.  Experimental work in human memory reveals a great deal of detail about how memory invariance becomes convolved into direct perception.  The school of ecological psychology was founded, in the late 1950s, when J. J. Gibson referred to this constraint as a convolution by environmental affordance.  Several hundred PhDs now work in the area of environmental design of space and on the human interaction with environments.  Much of this work is centered at University of Connecticut (where advisors Drs. Peter Kugler and Robert Shaw work).

 

In both human and synthetic perception, there is an inner perception and an outer perception, both having elements of cognition.   The substance of synthetic perception is composed of categorical abstraction produced by conjectural-forms acting on data.  A substructural process occurs over time and location and is processed by specific mechanisms.  In humans, the perception occurs as emergent process that directly depends on this un-perceived process.  The mechanisms we use have a strong analogy to the phenomenon involved when a massive number of individual photons pass into the retina.  The individual photons cause quantum mechanical events that then add to a structured potential that is sampled by dendrites.  A series of additional steps cumulates in a physical electro-magnetically guided convolution over the cortical layers of the visual cortex (Pribram, 1991).  The structure of the signal and the structure of responses from brain regions are mixed.  A range of adequate computational models of this neuro-physical model exists.  We have used a “tri-level” architecture where substructural elements (atoms and links) are aggregated into eventCompounds under global constraint (utility functions).  Several papers on the notion for this tri-level architecture have been published (Prueitt, 1997) and (more importantly) have been implemented in several different prototypes (for delivery of distance learning materials.)

 

Inner synthetic perception can be modified, in real time, by human variation (direct or indirect) of sensor parameters and conjectural forms.  What remains constant is that which is being looked at.   The effect, when achieved, will be real time “looking at” behavior.  It is this behavior that can be properly studied by human factors scientists.  Outer perception comes from a cognitive interaction with remembrance of past events.  This occurs in both the machine and in the human individual and community. 

 

From category data, event compounds are rendered visually as simple graphs.  A top down expectancy, from the knowledge base, can be used in rendering the graph.  We know very well the Adaptive Critic neural networks (Paul Werbos), Adaptive Resonance Theory (Stephen Grossberg), machine expectancy algorithms, and feature extraction and pattern completion technologies, and have taken them into account in our initial design. 

 

1.2:  Knowledge Operating System (KOS)

 

·        Synthetic Perceptual System (SPS): The main innovative claim is that a “synthetic perceptual system” is able to extract relevant data from massive sets of data utilizing a new approach that closely mimics the human perceptual-linguistic system.  The architecture of the Synthetic Perceptual System is fully prototyped and can be demonstrated as (1) computer information warfare system, (2) operational systems for generalized Latent Semantic Indexing for conceptual segmentation of full text, (3) a system that traces the behavior of humans who access complex databases, and (4) a general event detection machine that can be applied to electro-magnetic spectrum analysis.  A research-based comparison has been made (by Prueitt) to modern cognitive science theories (academic – Gerald Edelman, Karl Pribram, Daniel Schacter, Donald Hoffman, Robert Shaw) on human memory, awareness and anticipation.   Foundational elements from set theory are used.

 

Figure 2: Functional components of the KOS

 

·        Synthetic Cognitive System (SCS):  As part of our development, over a period of a decade, we have reviewed most COTS knowledge base systems. We need only to build an API to one of these (cognitive graph) systems and use the commercial system as a repository for small situated ontology, in the form of a graph with nodes and links and metadata, developed by the synthetic perceptual system and annotated by human interaction.  We will present computer architecture (with algorithms) for knowledge artifact retrieval from a cognitive graph type knowledge repository based on hierarchical structure and reciprocal processing (opponent processing (as in most neural network architectures – Paul Werbos, Stephen Grossberg) and re-entrant processing (Gerald Edelman)).  We envision this hierarchical structure and reciprocal processing as part of the synthetic perceptional system, noting that in the human perceptional system a great deal of what might be called cognition occurs.  The cognitive graph based Synthetic Cognitive System (SCS) is considered to be separate system, whose primary function is to be a common repository of community knowledge structures.

 

·        Semiotic String Processor:   A powerful, and yet simple, string processor was developed by Don Mitchell and Dr. Prueitt (2001- 2002).  We consider this string processor to be the kernel of the OntologyStream Knowledge Operating System.  Don Mitchell is the Director of Software Development at OSI.  He is committed full time to the development of all aspects of the work that OntologyStream does – including work performed by two additional software engineers.  Dr. Prueitt has adequate program management to coordinate this effort.  Internally the Semiotic String Processor (SSP) receives string commands over time, emits over time a series of change-events that occur in internal data models, and enables binding of perceived categorical abstractions directly to a data model.  The SSP provides a convolution of an inquiry as string-output of patterns of human behavior (during analytic work using the KOS).  This string is a chorography of the sign system found to characterize the behavioral actions of a human in response to perception of categorical invariance in the data set. The string processor is the interface between humans and our synthetic perceptual and cognitive systems.  The core SSP engine models human behavioral habits in interactions with the SPS and the SCS.  The focus is on allowing the (human/synthetic) perceptual system to be the primary interface for users.  One purpose of the string processor is to alter instrumentation and sensor parameters for input of log files into the categorical abstraction engine.  The second purpose is to control the interactions between the knowledge base and the cA-based perceptual system.  The result of these actions outputs a convolution over the data (log files) to produce categories of invariance what are rendered visually, as sound, or as control engines.

 

1.3:  Innovations in Algorithms and Processing Architecture

 

·        Fast In-memory processing with no third party dependencies: Our team can supply further computer science innovation regarding the computational production of categorical Abstraction (cA) and an In-Memory Referential Information Base (I-RIB) to support the KOS system.  The In-Memory Referential Information Base allows our systems to access, aggregate, and manipulate massive data sources more efficiently and quicker than any conventional systems to date.  This work is already completed by OntologyStream Inc and can be demonstrated.

 

·        Data reduction:  Categorical abstraction is an absolutely critical innovation to address massive data structuring/organization.  The synthetic perceptual system renders categories of invariance rather than individual occurrences of data events.  If a data event occurs just once, it is rendered as a category.  If the data event occurs a billon times, it is rendered as a single category.   This use of cA has been discovered to be able to “see” the event types, and variations on event types, in the data and to produce visual icons (small graph structures) that is the data (seen as organizational categories.)   The relevant data is retrieved with 100% precision recall, and the icons can be used to retrieve from data sets not involved in the event definition. (See screen shots in Section 5.)

 

·        Regularity in data structure:  Data structure sufficient to the required computational processes has regularity and predictiveness within context.  However, in most cases this regularity is not discovered and used to express control flow.  Following certain new trends in interoperability and standards (XML, RDF, KIF, Topic Maps) processes, we account for structural regularity and predictiveness within context as part of the Knowledge Operating System internal data formatting.  The work by two software engineers will address the enterprise-level software-standards essential to out of the box COTS software.  Software engineers will develop this aspect of the architecture, under the direction of Drs. Meyers and Prueitt.

 

1.4:  Notes on categoricalAbstraction and action perception cycles

 

The human perception system brings real time proprieties and priorities and contextualizes an action space.  The synthetic perception system extends human perception into the virtual computational spaces, including image and text data sets.

 

Using categoricalAbstraction (cA) the “synthetic perception” is about categorical invariance in data (structured or natural language data) and the categorical relationships that exist due to co-occurrence and pre-established frames (with slots and fillers).  Measure of structural similarity has been worked out during previous work on image understanding (1996-98).

 

The prior art for cA, and for the architecture in which we have embedded cA, is extensive but largely exists in literatures outside of Artificial Intelligence and Information Technology.  This work includes long-term research efforts in the areas of reflective control, applied semiotics, and second order cybernetics.  Much of the documentation of this research traces to Former Soviet Union science organized under special governmental organization (prior to 1991), and has only slowly been adopted into the intelligence control and machine intelligence literatures in the United States.  Significant research exists in the ecological psychology literature in the United States and Europe.  Dr. Kugler is a specialist on this work. 

 

Ours would be the first tool based on emergent abstractions, of the type proposed, that is intended for analysis of large data sources.

 

 

The bottom layer of the layered taxonomy is an open symbol system depending on invariance types (atom and link categories) produced from the aggregation (convolution) of computer data at selected points within an instrumented system.  An example of an instrumented system is a system for processing Intrusion Detection System audit logs.  A second example is data base access log files.  A third example is a sub-event log file produced while a text (conceptual search) engine is finding correlation between textual elements (this is generalized LSI).  In each case, the log file has the same simple (editable) format. 

 

The bottom layer of the layered taxonomy is also a structure of sub-types related to the invariance that are composite events of interest to the human community.  This structure, of sub-types, can be compared – using grounded metaphor, with the “memory” of texture, color and form in a human perceptional system.  Formal notations, on voting procedure (Prueitt) and quasi-axiomatic theory (Alex Citkin, Victor Finn and Dimtri Pospelov), exist for autonomous aggregation of memory (categorically linked invariance) of this type into cognitive graphs.  These formal notations support advanced methods that will be developed as part of scholarship on foundations of logics and computer science.  Used in this way, cA “memory” of invariance is aggregated into situated ontology, ontology with emergent contextual scope that will appear as an instant retrieval of information to the human.

 

The second layer of the layered taxonomy is an event layer that is responsive to a deployed infrastructure of human annotated event types and to the visualization of pre-existing event patterns at the third level.  This layer second can, in practical ways, have algorithmic interactions with a cognitive graph type knowledge base. 

 

Top down expectancies is thus possible using any of a number of algorithmic methods (evolutionary programming, adaptive resonance theory, or adaptive critics).  The class of these expectancies is compared with the connectionist scholarly work; by Werbos, Grossberg, Holland and others; in automating recall using formal models of knowledge associative memory.  Our initial architecture, already implemented in code, is a bit simpler than these classical algorithms, but a more sophisticated associative memory will be implemented about midway into our 20-month commitment. 

 

The more sophisticated associative memory will be developed under direction and consulting with Professor Daniel Levine.  Daniel Levine is one of the leaders in the cognitive science / biological and artificial neural network community.  Dr. Levine is a professor of psychology at University of Texas at Arlington and has had several decades of scientific interaction with Professor Pribram, Drs. Kugler, Murray and Prueitt (specifically as advisor to his dissertation).  Professor Karl Pribram is Stanford Professor Emeritus (now at Georgetown University) and world renown as one of the founders of the field of cognitive neuroscience.

 

The third level of the layered taxonomy is a knowledge management system having knowledge propagation and a knowledge base system developed based on Peircean logics (small cognitive graphs) that have a formative and thus situational aspect.

 

The fourth level of the layered taxonomy is a machine representation of the compliance models produced by policy makers.

 

(Figure 1 is a graphical representation of the four layers)

 

 

1.5:  Notes on conjecture performance of the KOS 

 

This subsection provides a sense of how the Knowledge Operating System is designed and may work within the Glass Box.  In the world within the KOS we may ask, how should a process "know" a goal? 

 

A process may ascertain a degree of goal attainment through a conjectural analysis of parameters affected during goal attainment, and reckon a scalar quantity along a known range of responses.  Knowing is therefore, in the simplest case, a convolution of state onto a linear scale.  In a more non-simple case, knowing is a state in a cross product of linear scales where the several (small and finite) scales has an understood Peircean "ground” involving both agile determination of which scales are relevant and oppositional poles to each scale; good-bad, up-down, inside-outside, etc.

 

In the context of the KOS, a human gesture (voice, text, sound) is received as a command.  The KOS alters states in the topology of memory locations, and produces internal change in the KOS process model.  What the human user sees after the gesture is an interface that is depicting internal state as altered by the gesture.  In complex information displays, the state of the response can be illustrated on part of the computer screen using Chernoff faces, for example.  In many instances the informational display need not be this complex.  However, the critical issue is that control parameters are variational response to the user’s gestures, in a way that can be modified and controlled.  Control has to be natural and simple.

 

The goal of the KOS design is to remain centered within action/perception principles of ecological psychology (project advisors Robert Shaw and Peter Kugler are leading experts in this field). The depiction of internal state in the interface may reflect the success of goal attainment to the user in a fashion immediately recognizable through visual inspection, and/or may produce other display events over time, such as the frivolous example of a "scale of feeling" correlating to sounds such as "Ya!, Ah Ha!, Oh, Ahhh, Huh?, Oh-oh, Yikes!".  The sounds may be background music with variations that go along with the Chernoff faces.  Such an example would have tremendous value in a situation of information-immersion during episodes involving large amounts of information processing. 

 

The KOS contains two interfaces to syntactic structures in the form of ordered triples < a, r, b>, where a and b are subjects and r is a relationship operator.  One interface is to the cA world being developed from a direct rendering of categorical invariance into visual (and auditory) form.  The interaction of humans with cA structures supplies to the cognitive graph type knowledge base triples with a rich metadata (likely encoded as RDF and HyTime).  The human information interaction is a process that involves an assembly of a model of a knowledge structure placed into the synthetic cognitive system. 

 

The syntactic entailment of the KOS is divided into a section of WHAT and a section of HOW. There is no syntactic entailment for WHEN in memory, the KOS being in essence an event driven reactive engine that produces a more useful WHAT by method of the stored HOWs only WHEN commanded to do so.  This separation is consistent with other current generation question driven knowledge bases such as the Mark 3 from Knowledge Foundations Inc (project advisor, Richard Ballard’s system).  It is also consistent with our project domain experts, Nathan Einwechter and Dean Rich’s, approach to detailing hacker behavior using information warfare techniques. 

 

So the entailed HOW in the KOS is called a habit.  The representation of habits are not limited to a linear list, but are themselves arranged into a taxonomy of names, stored in a memory structure as a simple tree. The habit tree structure captures the human meaning, or semantics, of the intended use of the habit.  However, during the execution of a habit, its meaning in pragmatic context of goal attainment may be measured from the "information of structure" from the location WHERE the habit is stored within the tree structure.

 

Again, it is the structure of habits in a hierarchy that captures the parameters enabling a measurement of goal attainment, such goals which the habits are themselves designed to achieve.  The structure of the habit tree is human provided, and stores a semantic purpose.  Upon use of a habit, the human user will anticipate a response within a range.  An emotional binding between the human and the KOS is likely to occur.

 

In accord with principles of human gesture and anticipated response, the KOS may perform a complex conjectural analysis of habit parameters and parameters of syntactical context involved in the sequence of events that move a moment from start to completion, to produce an overall assessment of outcome as a "feeling" concurrent with the outcome.  This feeling is only perceived as a feeling proper by the human agency of mind; however, the silicon state response is a real measure of outcome and partial outcomes over time.  The KOS state provides a mechanism for a lock on context of intent between human biology and the computer.

 

In terms of modeling, the KOS WHATs and HOWs have a close parallel to an XML document that contains XML data as a hierarchical model, and another section of XSL data has encoded transforms that act upon the data to produce useful output.  Note that though the KOS may model and produce XML, no XML is used at all internally in the KOS, and internal hierarchy is only represented by very fast arrays as linked structures called objectComposites.  The KOS has zero external technology dependencies beyond the core runtime binaries of the programming language within which it is implemented.

 

Insomuch as human knowledge is complex, so to is a knowledge base built from human knowledge as represented in the KOS.  Complex does not mean complicated, only that some identities have underconstrained meanings.   Habits affording the HOW one may produce useful outcome produce a dynamic stratified process during execution within the process model of the KOS.   The stratification appears over time during execution of the habits, as habits call habits that can call habits, etc, to the resolution of the solution anticipated by the meaning known by the human that invoked the initial habit. The complex conjectural analysis is therefore accomplished over time as habits weave a complex task over time.

 

Section 2:  Plan for accomplishment

 

Roles and interaction.  SAIC will handle contract management, interaction with NIMA, programming support, and contributions in human factors and social design for the collaborative layer of the product.  There will be two subcontracts; one to OntologyStream and one to TelArt.

 

TelArt will work with Karl Pribram’s office in a) developing an interface to an extended research community (human factors, human information interaction science, computer science) and b) hosting the scientific conference on HII in early 2004. Forty invited scholars will attend this 3-day conference on Georgetown University campus.  Professor Daniel Levine will be the Chairman of the conference, and a consultant to OntologyStream.  The scientific conference will be partially funded by NIMA, and we will approach the National Science Foundation for co-funding. 

 

TelArt will take responsibility for knowledge representation issues, including the co-supervision of the software engineering produced at SAIC and primary interactions with Dr. Richard Ballard (founder of Knowledge Foundations Inc. and developer of the Mark 3 knowledge base system).  TelArt will develop training materials and provide training as needed.

 

SAIC will provide the primary office space and computer infrastructure, and will support the interaction between NIMA and SAIC.  Project reviews will be held at SAIC offices in McLean.  Two SAIC software engineers will code and test and benchmark software based on prototypes from OntologyStream.  OntologyStream will be responsible for the software design and for managing the technical work at SAIC, TelArt, and Georgetown. 

 

 

The Program Review Workshops will be focused on the technical  and social evaluation of our work in the context of intelligence analysis in general and the Glass Box in particular.  However, the value we bring to the intelligence community depends on our maintaining simultaneous close contact with outside scientific communities. 

For example, we expect that third parties, not funded by this project, will develop primary research on the use of categoricalAbstraction and related formalism and technology.  One PhD thesis on cA has already been proposed.

 

Additional detail on the responsibilities of individuals is shown in the Resumes Volume. 

 

Work plan.  The tasks and manner of completion are adequately described, for now, in the statement of work (Section 6), the deliverables (section 7), and the milestones (Section 10).  A government services proposal will generally provide additional “how to” narrative in response to the statement of work, but we have not added this, judging that the space and effort was best spent on elaborating the innovative concepts.  A detailed work plan will be presented a few days after award.

 

Evaluation plan.  We offer the following performance measures as a way of tracking our progress over the contract.  We agree with NIMD program managers that user experience as well as technological dimensions need to be assessed. Additional meaningful indicators of progress will be devised as cycles of development occur and as new variables are proposed.

 

This first group of indicators apply to the computer intrusion application (SOW task 6), but in order to achieve these we will have made progress with many preliminaries.

·        the number of categories that are annotated and linked to instance data by the test operator

·        the saturation level of the category set for designated test data sets (average ratio of newly sampled bytes per new category)

·        intrusions identified by this means that were identified by other means (corroborations)

·        intrusions identified by this means that were not identified by other means (novel)

·        after reaching category saturation in any test data set, intrusions that were not identified by this means that were identified by other means (false negatives)

·        Facility of operator, in terms of time and perceived effort to process new, comparably sized test data set.

·        Subjectively reported instances of Orienting Reflex  -- the ‘head turning’ experience when presented with apparent novelty.  (Requires a belief that the information generated by the system is often meaningful, and requires that the information be presented with appropriate salience.)

 

The above indicators are repeated for the second domain application.

 

For the whole project, we propose the following indicators:

 

·        Number of substantive responses from team members to research memos produced by team members.  (A gross indicator of both reflection and interaction which, within a team of productive individuals, correlates with useful results.)

·        Number of tests that tend either to corroborate or refute working hypotheses.  (Gross indication that many research questions are being operationalized and pressed toward conclusions.  Another SAIC team is proposing an HRinG hypothesis tracking tool with the project would like to use for this management task.)

 

Risks.   This is ambitious research, which is inherently risky.  We have sound theory and working prototypes, but the application of the method to intelligence work is unproven.  We have high confidence but do not rely on that to pull us through.

 

Our primary approach to risk mitigation is in the formation and practices of the team.  Many members have worked together and will seed their culture within this team.  Members are not shy about proposing ideas, half-baked or not, and are also not shy about finding the weaknesses in such ideas and either filling them out or proposing alternatives.  There is no component of this work that is ‘owned’ by an individual or that is sacrosanct.  Further, the part-time advisers are world class thinkers who would not have agreed to join the project if it were not aimed at something vital, likely to achieve something vital, and open to their input.  A team such as this senses risk early, in all dimensions, and acts effectively on risk though not, in the end, by ‘playing safe’.  Because we are many and varied, there is the possibility that this will cause confusion and lack of convergence.  We have guarded against that by selecting only those who have agreed on some principles, agreed to disagree on others, and have show an ability to focus as a group in similar efforts.

 

One of our highest risks would have been to find a suitable extraction algorithm, but our prototype works and will not be a concern except at the margins as we modify it.   The rest of the software development and integration that we need, while hard work, presents no unusual demands. 

 

The largest remaining risk is the whole matter of matching the operator to synthetic sensory input.  The way to do this is to rapidly adjust based on rich feedback.  We have a very dedicated and reflective operator who will give us good feedback, which combines with our comprehensive and sensitive observation, interviewing, and interpretation techniques.  We will use flexible tools and initially simple routines and displays that will facilitate rapid change.

 

Section 3: Expectations for Glass Box Interaction

There are several objects that we will submit to or exchange with the GB (or to similar environments), and several GB services that we can take advantage of.  Some of these interfaces will be necessary and straightforward in order to have an impact on overall intelligence products, while others are optional or may require transformations to be effective. The direction of flow may be one way or two way. Finally, some services that we are providing independently could be instead performed by other tools in the GB. All of these will be investigated for their worth and feasibility and a design and specification proposed in relation to the standards and interfaces that are offered by the GB.

We would expect the GB to hold raw data in a warehouse, or, in appropriate situations, to deliver streaming data that our algorithms can be trained on. For data that needs to be transformed into processable form, such as image data, we would expect the warehouse to handle that service. Our software currently transforms data into tables, but this routine could also be allocate to warehouse processing.

The next several objects and steps, discussed elsewhere in this proposal, must be processed by the code we develop.  None of the interim products would be sensible to share outside of our method, except possibly the “conjectures” that orient our algorithm to search for categories.  These conjectures are a kind of hypothesis that, while syntactically oriented, may nevertheless have some value to track within a more general function provided by GB for hypothesis generation, evaluation, maintenance, and linkage.

The resulting conceptual graphs that are used to represent categories require complex functions from a knowledge base application.  Development of these must occur in the kbase environment that we are providing. In principle, however, programming at this level could be rehosted in a shared (GB) kbase environment, and we expect to be able to produce graphs and ontologies using XML-based schemas that other GB components can process.  Yet it is possible that there will be differences between the two environments that are crucial to the quality of the product and that will be difficult to resolve in a way that allows easy porting from the development kbase to any of the GB kbases. This will be an important issue, and as early as possible we will study what kbase services are available and specify what we will need as a target. A fallback position is to continue to use our kbase within the GB and use negotiated XML-based interfacing to exchange products out of our repository.

We are counting on a significant contribution from the GB in terms of visualization routines. Advanced packages are available that we would like to use, but it would be a distraction from our research, and a misuse of our skills and funds, to purchase such software and to engineer the fine points of visual expression.  During our development effort we will use relatively simple and easily modified visualization routines, but we expect this to be thrown away after we arrive at the final visual semantics, cognitively appropriate 'look,' and full specifications. Those products will facilitate rapid rehosting and rendering in an advanced visualizer.

Communication and alert functions, serving collaborating groups of analysts, are needed to complete the 'top' level of our design and are necessary for gaining the full value of our work. This is perhaps more true of our approach which emphasizes immersion of the analyst in the problem, in contrast to situations where the analyst supervises semi-autonomous technology.  We will specify appropriate communication functions and will identify COTS environments that offer them. Again, it would be a distraction for us to complete this engineering, but, in addition, we do not have access to a large group of analysts who would be necessary for development and testing. The group that we do have access to, however, is the research team itself, which can adequately simulate a domain analyst community.  We will probably use Groove, a flexible collaborative environment, and may develop custom templates or tools within Groove to demonstrate selected top-level collaborative functions. Again, we expect the GB to have a similar platform for collaborative functions, and that these functions can be modified to serve our needs that we will specify during the research. We do not expect that any of these needs will be exotic, with the exception that any discussion will need very easy ways to reference or link to hypotheses or other complex intelligence objects.  We are not ourselves developing this capability but are aware of others who are, and whose work we expect to employ.

Overall, the technological interfacing requirements for linking our project to the GB are conventional and not extensive.  The skills or our programmers are up to the task.  But if any unusual skills or troubleshooting are needed, we are able to reach back to SAIC’s pool of software integration experts.

Our technology may qualify as “disruptive or cumbersome” in one important respect, and may thus require explicit consent from analysts in the GB who use it.  There is no risk that the technology will harm the analyst, only that the analyst might fail to contribute appropriate mental discipline.  The technology only works (in the sense of generating highly valuable, novel intelligence) if the analyst spends enough time with it to become familiar with the categories and runs it long enough that the library of categories is well-stocked.  We will develop a ‘quick start’ orientation and training routine, but the analyst needs to recognize that a) the findings arrive via the operation of his own judgment and are not simply spit out by the machine and b) the analyst must want to and be able to sustain an immersive or flow state in order for the synthetic sensory system to operate correctly.  There is nothing difficult or painful about doing so.  In fact, it is pleasant.  But it does require relaxation of a critical, objective observer stance.  The system can of course be criticized, but not from within its operation.  The same may be said of any method – one has to contribute the inputs that it requires and not something else that may be easier to provide.

We have claimed innovations in areas 4 and 5 and have not discussed the other areas.  This is not an oversight.  Our strategy is to concentrate on developing our method during the first 20 months to make it worthy of inclusion.  If we succeed, work during an extended period of performance would concentrate much more on the other areas.  We feel that we have much to offer area 1 from a theoretical standpoint.  We will surely contribute data to the tools in area 2, since many of our findings will stem from common sense, be uncertain, or be tacit, and need to be taken up in broader control regimes.  We will, in the later period of our 20 months, demonstrate the ‘higher levels’ of our system, where we use virtual collaboration as a means to elicit and transfer tacit knowledge.  Finally, we definitely require support in area 3 for hypothesis tracking, and would prefer to join very early with another team that has a suitable tool.  The HRinG hypothesis system being prepared by another SAIC team, assuming modifications for corroboration recording and reporting, would be an excellent option.

 

Section 4: Scenario 

 

[This scenario is specific to the initial domain we are addressing.  The scenario in the executive summary is generic to all applications.]

 

A terrorist organization has assembled four independent groups of computer hackers across the world.  Each group is given a different task; each task supports another group’s task. The interweaving of tasks creates a well-coordinated and timed attack.  The first group’s task is to breach the internal networks of the U.S. Power Grid Command and Control networks.  The second group is tasked to electronically break into five major U.S. banks, and the Federal Reserve.  Meanwhile the third and fourths groups are “running interference” by launching false or partial attacks on various government agencies and facilities.  Groups one and two begin preparation for their attacks by doing basic Internet searches and reconnaissance probes into the targeted networks.  Given the distributed and well-coordinated nature of these probes (and later on the attacks), the computer intrusion analysts see basic scans and attack attempts, of which they see thousands of per day.  There’s no reason for them to believe anything particularly devious would be in the works, due to the limited scope of each analysts sensor arrays and data.  Each analyst at each facility allows the system to automatically log the attempts, but overlook them due to the routine nature of such attempts.  Later, the attacks are carried out and successful breaches of internal security at 90% of their targets are successful, the Federal Reserve was not breached due to existing security measures.  Once the full attacks were being launched, the computer intrusion analysts were unable to completely repel the attacks because of the attack obfuscation and interference created by groups three and four. The U.S. economic infrastructure is totally devastated, hospitals are forced to run on emergency power, many die from the power outs, and mass chaos ensues.

 

Had a system such as our proposed synthetic perception system and categorical abstraction been employed throughout this attack, along with a well-made event knowledge base, the outcome would have been significantly different.  The system will have the ability to look at all the data from intrusion detection systems nationwide, and identify patterns that would indicate a coordinated attack.  This information would then be presented to the analyst for further review and corroboration.  Further, if the coordinated attack patterns had been allowed unchecked by the human analyst and the attack had still gone into effect, the system would have had the ability to detect which were the “real” attacks, and which were mere attempts at obfuscating the real attacks.  This would allow the nation as a whole to prioritize reaction to attacks, not by what is being attacked, but whether it is believed to be the actual target or not.  Without this, the analyst’s first instinct would be to protect major intelligence networks first, and leave the banking and power systems lower down in priority, due to false attacks launched on the intelligence networks.

 

Section 5:  Integration of Synthetic Intelligence (SI) technology system with components developed by other NIMD participants.

 

In Sections 3 and 9 we discuss the Glass Box and technological integration.  Here, we discuss more fundamental aspects of integration. 

 

Our work is grounded in human sciences as well as computer science.  We expect to make a contribution to the entire NIMD program on that basis, rather than by offering tools for all five areas, or by providing a technological infrastructure for all five areas.

 

In taking on this responsibility, we expect to have some challenges.  For example, visualization of computer data is often an appropriate way to engage human perception.  But, in addition to making the human expert subservient to the computer programs, data visualization approaches break down as data sources get larger. There are other deeper perceptual/behavioral issues that inhibit the adoption of proper HII systems.  Our team is in a position of showing what might be done if categories are visualized as the primary interface to human information interactions, with the exact manner of data visualization playing an important but secondary role.  By revealing our innovations in computer science, we lead other technologists.  It will not be the case that anyone will have an ability to hide behind the technical complication of work proposed or accomplished.  We seek a respecting relationship with the other NIMD participants, but only as we make progress together toward increased security of the nation.  We feel that this is what NIMA program managers are looking for in this BAA.

 

Successful human and social learning is the basis for the cultural value to be derived from a cyclic process of action followed by perception.  This involves humans being immersed in the experience and in the use of community language. We also need machine support to overcome classically understood behavior characteristics having negative impact on truth seeking. 

 

Our team will continually ask the following types of questions:

 

1)      How can a science of human information interaction be grounded in such a way as to find acceptance within the science and technology communities? 

2)      Humans are enormously capable as perceptual-language 'machines', but our analytic thinking is often flawed and simplistic.   How can the resources of human perception and cognition be studied, in the context of critical intelligence gathering and analysis? 

3)      What about knowledge propagation within communities?  How can this be studied?

 

By looking to natural science we see beyond the present entrancement of computer science with first generation models of natural intelligence.  We are able to look at what we, and others, are calling the New Computer Science.  The means a number of things, but primarily it means the techniques of computational emergence (stochastic engineering) is to be coupled with description enumeration of the elements of finite natural type.  Computer Science becomes more pragmatic and less artificially precise, when precision is not relevant. 

 

We can illustrate the limitation of the old computer science.  Tell us precisely “how much do you love your spouse” or “ which of my three daughters do I love the most”, or “which of these terrorist cells will launch the next attack”.  These questions are actually categorically related to questions like “What is the largest integer?” or “What is the smallest real number?”  The questions are just not proper to answer.  Looking beyond the AI Dream, with the realization that “simple” machines are not now and never will be living; we can then (and perhaps only then) see how to accommodate the many formal limitations of the current software designs.  It really is necessary for human judgment and cognitive acuity to be involved if the system is properly to be called “intelligent.”

 

Methodology from the natural sciences can be applied to the study of Human Information Interaction if the fidelity of the information technology is of sufficient quality.  Cognitive graph (ontology) based decision aids are not of sufficient quality by itself, because there is no perceptual aspect to system interaction.  In most instances the available machine ontology is not formative from a perceptual act. 

 

Within the cognitive neuroscience literature, images of achievement are said to direct human behavior (see Pribram’s chapter on this in “Brain and Perception”).  But the perception of an external reality is necessary to tightly coupled action in the world.  Without the machine ontology being formative from an act of perception then the information technology has a radically different and often incompatible nature when compared with acting and perceiving as part of the human experience of sense.

 

In Figure 3a, we show some of the early work on creating categories and rendering these categories as visual abstraction.  Figure 3a shows the results of a feature extraction process (scatter-gather on the surface of a sphere) that has produced a compositional ontology having five layers.  Each of these elements in the compositional ontology can be viewed as an event compound composed of elementary patterns of invariance and types of relationships that this pattern has with other patterns.  The compound has the nature of a chemical compound composed of elementary atoms (of invariance type) and valance (Figure 3b). 

 

  

a                                                               b

 

   

c                                                  d

 

Figure 3:  Screens from the software prototype for formation and visual rendering of cA

 

Both elementary number theory and category theory are used in the underlying formalism (again, this formalism was developed during the past decade by Drs. Prueitt, Murray and Kugler, based on foundational literature.)

 

Great flexibility is provided for the fast assembly of atoms (of invariance type) and link-types into small colored icons (Figure 3c and 3d).  An in-memory data structure is expressed or rendered directly in one pass over the structure. This is considered a perception of the map.  Evidence has been acquired, using OSI software, that this in-memory map structure has the nature of a hologram/fractal, in that partial (random) retrieval will often look very similar to complete retrieval.  Human information interaction occurs during incomplete rendering, or when the event itself is only partially complete. 

 

As in a real event, occurring in the natural world, an eventCompound is nested in other eventCompound and has sub eventCompounds.  In the natural world, the phenomenon of emergence provides a core open problem regarding the specification of the boundary and initial conditions of an event.  Work on this involves the physical science study of emergence and non-equilibrium physics.  This work introduces Complex Adaptive Systems (such as developed at the Santa Fe Institute) and Stratified Complexity (being developed by members of our science community).  This work in the natural sciences is absolutely necessary in the design of the OSI software.  However, the interface is simple to use.

 

An event series is a temporally ordered series of events with strong causal dependencies.  Event detection in the computational spaces has a similar modeling problem, except that in the computational spaces there is an absolute ground level -- all events in the computational spaces are reducible to patterns of 0s and 1s.  Levels of organization in the computational space are seen as expressions of habits in the regularity of human interaction with the computational spaces.

 

(The theory of stratified complexity when applied to computer science produces some startling results.  See: www.ontologystream.com/IRRTest/Evaluation/ARLReport.htm )

 

The integration of the categoricalAbstraction based synthetic perceptual system with industry standard metadata rich knowledge engineering systems is made via a translation into machine-readable ontology (such as XML with RDF, KIF, or Cognitive Graphs (CG)).  

 

Section 6:  Proposed Statement of Work

 

1)      Design process.  Describe our proposed work process and optional paths through it.   Employ event diagrams, object model, and other views that specify the elements and interactions within the system and with environments.  Indicate sources of variance and error.  Describe how system learns. 

2)      Design computing components of system.  Indicate software, hardware, and interfaces.  Describe the data reduction and analysis algorithms in pseudocode.

3)      Develop data sets.  Develop two test data sets.  The first is based on computer log files.  The second will be selected, with the advice of the NIMD program office,  in a domain central to the concerns of the Glass Box, composed either of text or image. 

4)      Design human components of system.  Describe sensory and cognitive process that the operator employs within the system.  Explain how capacities (i.e., memory, attention, etc.) are leveraged and how this differs from conventional analytic processes.  Indicate needs for selection and training of operators.  Describe experience of the individual operators and the community of operators and their interactions.  Indicate support roles, such as administrators and analysts of supporting analysts.

5)      Enable the KOS to model human information interaction.  Use the description of sensory and cognitive process that the operator employs (see #4) to encode gesture states for the KOS Interface.  Demonstrate sufficiency in this description as a chorography language for Glass Box interactions.

6)      Apply system to computer intrusion domain and second domain.  Design scientific protocol for case study observation and for performance assessment, including tracking of false positive and false negative intrusion detections and recovery from error. Test hypotheses regarding key claims and variables of the system, such as the range of data sampling rates that will simultaneously be fast enough to process massive data, detect rare and novel events, and present these detections at a level of salience that meets thresholds for human recognition.  Develop performance indicators and models that will guide the tuning and enhancement of the system both in this application domain and generally.   Apply to second domain.

7)      Develop software.  Beginning with currently prototyped algorithms and display techniques, and with open COTS knowledge base, improve functions based on requirements and specifications developed throughout the investigation.  Use rapid prototyping and testing techniques.  Document the code.  Develop a cyber defense prototype system as a proof of concept, suitable for Glass Box integration but also capable of stand-alone operation.  The architecture will be generic and remain applicable to other domains outside of cyber defense.

8)      Fit the system to the Glass Box environment.  Specify interfacing, functions that will be relied upon within the Glass Box, and functions that are included in the developed system but may be reallocated to shared Glass Box applications.  Indicate what this system contributes to the Glass Box that is unique and necessary.  Indicate use of standards or need for standards.  Describe an ideal technical and institutional setting that would fully take advantage of this system, both in its current implementation and in further application of its principles.

9)      Document findings.  Prepare articles for publication.  Prepare proceedings volume from HII scientific conference.  Prepare presentations and demonstrations. 

10)  Conduct research review meetings.  Invite and prepare with briefing materials interested parties and scientific advisors to the project, and facilitate and write up criticism, dialogue, and questions for continued research.  Conduct management reviews and workshops as required by sponsor.  Conduct HII scientific conference.

11)  Train and Deploy.  Provide training and deploy as requested.

 

Section 7: Deliverables

 

Over the term of the 20-month contract, we will deliver:

1)      A Knowledge Operating System.

a.       Applications to two domains

b.      Compiled code will be provided, and when requested source code.

c.       Tutorials and technical documentation.

2)      Instructions for the integration and evaluation of the KOS in the context of national intelligence and the Glass Box.

a.       Interface to data sources

b.      Specifications of APIs to other NIMD participant software systems.

c.       Evaluation methodology, taking into account needs of other NIMD participants

3)      A collaborative system based on simple, open source, knowledge management technologies, suitable for participation by non-funded sciences in the evaluation of the KOS in the context of:

a.       A general purpose event detection system

b.      Human / computer interaction

c.       Formal logics and foundations of mathematics and computer science

4)      Conference proceedings on HII (with NSF as anticipated co-sponsor)

5)      NIMA-required workshops and management reports.

 

Proprietary claims to results.  All software developed under the contract will be open source and freely available within research communities, especially university groups who may aid in establishing mathematical foundations to the concepts of categoricalAbstraction and the Knowledge Operating System.

 

OntologyStream will be the owner of materials developed under this contract; SAIC and TelArt will have no ownership stake in the developed software. 

 

Several potential patents have already been identified by primary researchers on the KOS systems, and the team will identify additional patents as work proceeds.   Regarding these patents:

 

1)      Applications will be prepared using private money, not contract funds

2)      Use privileges to the United States Government will be unrestricted

3)      All relevant information about the patent and the use of the patent will be made public within two weeks of our first disclosure to PTO

 

Our purpose in disclosing patents is to make a claim that original work has occurred and will continue to occur, while also providing technical detail to the public concerning the foundational sciences and mathematics being developed.

 

We also anticipate the possible use of third party patents if applicable and if the patent use is not inhibitory.  An example might be a security encoding patent.

 

 

Section 8:  Unlimited Rights to the Government

 

All software will be furnished to the Government with unlimited rights in accordance with DFARS 252.227-7017. 

 

 

Section 9: Availability of deliverables for Glass Box demonstrations

 

Deliverables and documentation of the KOS systems will be provided to the Glass Box integration team, along with any installation and modification support as might be needed to establish operational capability in the Glass Box.  We will respond to any request for training in a timely fashion.

 

 

Section 10:  Period of Performance

 

The proposed period of performance is twenty (20) months. 

 

Milestones

 

Sept-Dec 2002 (four months):