home ->.

 [8]                               home                            [10]

ORB Visualization





It takes a Community to Create a National Project


 Sunday, November 23, 2003




So while the case is made that a new contract is to be awarded, the OntologyStream Inc led research group will develop and publicly post Subject Matter Indicator Taxonomy (SMIT) over the public FCC e-Doc collection. 





A separate web site will be unveiled next week where a fast full text (inverted index) will be placed on a test repository that is essentially a copy of the FCC repository.  This engine is the Instant Index Inc engine that is selected as a replacement for Verity type indexing.  There will be no cost for this contribution. 


No cost will be incurred for public use of the site as a research tool, however all transactions will be recorded and made available (with private information removed) so as to understand the public’s need to access the FCC public rulings.   A notice will be made to this effect.  For regular users who wish to create cookies that preserve information about past search results, this service will be made freely available. 


For each of five or six different groups of algorithms, additional Subject Matter Indicator Taxonomy (SMIT) metadata will be generated for each document in the test repository.  So any person who wishes to access the web site can evaluate the retrieval results about automated generation of subject matter indicator metadata.  A committee will be formed of university based full text understanding researchers who will develop methods for measuring the fidelity of SMIT result.  As this is an exciting research opportunity for already employed university professors, there will be no cost to this outcome measurement.  The committee will generate distance-learning materials to be published as text books on state of the art metadata systems. 



Note on Knowledge Management and Stratify.


In-Q-Tel is rumored to be looking for knowledge management capabilities that they have not been able to invest in, as yet.  We would like for that investment to flow to the OntologyStream led research group.


The key problem in semantic web based knowledge management is reconciliation of those terms from a controlled vocabulary where real differences in interpretation exist.  Stovepipe type information flow cannot be overcome as long as these differences in opinion are maintained by the metadata used in workflow elements and in policy discussions.   A small scientific community concurs with this conclusion, and a small group of innovators have developed tools to address this conclusion. 


The OntologyStream copy of the FCC e-Doc repository will not have a workflow element at first, because there is a fixed set of documents in the harvested collection.  However, there are fine distinctions between term interpretations that can be discovered in this collection when the concepts being expressed are trended by the NdCore visualization tool.  This will be demonstrated using the copy of the FCC e-Docs repository later this week.  Text Analysis International Corporation is completing a conversion of the FCC e-Doc repository into the input format best suited for CCM-Powered Ontology Referential Base,




and the visualization of Subject Matter Indicator topology:




This conversion is being done using the Visual Text IDE for quickly developing high quality multi-pass parsing and tagging.   A subject, verb, subject phrase, verb phrase markup will occur and will be encoded in to XML metadata placed at the beginning of each file.  Also, each text will be examined linguistically and heuristically to identify exactly those sentences that are well formed and easy to recognize as sentences.  The sentence themselves, and the header metadata, will be encoded into a SMIT text repository and used by ATS in demonstrating the commercial NdCore visualization of themes expressed as a function of date published by the FCC.   The experimental CCM-Powered OntologyStream ORBs will produce topological maps of the Subject Matter Indicators as discussed in our public research papers.  Entrieva Inc may choose to use the well-known Semio taggers and a new taxonomy tagger software system now being deployed in other agencies.  Several other companies will also be generating SMIT as well as browser interfaces and control elements targeting the copied FCC e Doc repository.


Visualization of these fine distinctions allows an automation of the management of terminology-use based on organic evolution, of user language, driven by community use practices.  We can also set up the type of inference regarding opinions expressed using the same architecture as in:




The public and research community can then apply workflow to the process of including new FCC rulings into the OntologyStream ORBs.  As new e-Docs entries occur the ORBs will make the subject material available for public use.  We intend to create the most interesting research environment ever seen by the text analysis research community, and then facilitate the use of this environment by the American public. 


The system is to be reproduced for other government regulatory public archives of rulings.


The OntologyStream Inc research group is seeking capitalization from In-Q-Tel, or private investors.  The FCC ORBs will be used to extend ORB technology within the federal agencies.


A very specific demonstration project is selected as the most viable, for the first year of operation.  The OntologyStream led research group will use our copy of the FCC repository to demonstrate how Stratify Inc full life cycle taxonomy management can be coupled with schemalogic, work flow and inference to provide a type of knowledge management system that is not now present at any of the agencies. 


Licensing of SchemaLogic’s SchemaServer:




and various other supporting technology will be sought. 





It takes a Community to Create a National Project
