National Knowledge Project

[205] home [207]

Thursday, November 17, 2005

The BCNGroup Beadgames

National Project à

Challenge Problem à

Center of Excellence Proposal à

An element of discussion about the development of ontology

Posted 11/17/05 by Gary Berg-Cross

http://colab.cim3.net/forum/ontac-forum/

I wanted to follow up Eric Ps earlier message about a hub approach to building our common ontology. I think that his questions and issues got side tracked.

Editorial; request (definition of a hub (in this context)) ß send email

a) relationship between modular ontology and a hub

b) the relationship between the concept of a common or “upper” ontology and modular ontology

Eric was curious about “how pervasive those anti-hub feelings really are.“

I’m not of one feeling on this issue, since I think the issue is complex. I would welcome some discussion of some of it. Eric had particular ideas on Dr. Sowa’s sub-sumption lattice idea, but I haven’t heard responses to that and perhaps others can respond to it.

Editorial; request (request for outline of Eric’s ideas on the lattice) ß send email

For myself, I could imagine a hub or modular approach depending on the quality of the hub. I’d have to be convinced that it was doable, and would want to know the “seed” for it and the process or development. I don’t see how this could be done with merging some existing ontologies.

Various people have talked about using UMLS DOLCE/BFO, SUMO, OpenCyc, ISO 15926, FEA-RMO and the DoD Core Taxonomy.

The FEA-RMO is an Ontology of a Reference Model and not of an actual domain such as health. It seems quite hard to connect to this to others.

Also, the DoD taxonomy, in my opinion, has the degree of problems that Barry and John pointed out in the "general ontology" so it may not be easy to assimilate. We might start without trying to merge these in and also might start with the best 2 or 3 as candidates to seed an effort.

Another point or question concerns leveraging the experience of past efforts. Back seven years or more there was an effort by the ANSI Ad Hoc group to construct a standard, called the Reference Ontology. They had a five-step approach for the following

1. Upper levels (approx. 100,000 terms): Bring into correspondence (to align) the terms of a small number of selected large-scale ontologies (eventual size approx. 100,000 items). Do so inclusively; that is, create a result in which users can choose which of the component ontologies' terms they wish to see and use.

2. Domain models (under 2,000 terms each): Link into this Ontology selected domain-specific ontologies, developed to support reasoning about time, space, physics, geography, etc. Do so inclusively; allow the linkage of various different models of time, space, etc.

3. Access tools: Create easy-to-use tools for Ontology access and extension.

4. Dissemination: Place the resulting Reference Ontology on the Web, freely available.

5. Theoretical basis: In ongoing work, have a team of highly qualified individuals comb through the Ontology to find powerful generalizations, to weed out unnecessary and inconsistent items, and to create a maximal factoring of the upper levels of the Ontology.

Seems quite similar to what we are talking about. Whatever happened? Did it fail because it didn’t have an upper ontology?

They listed the following are candidate sources for terms to be included into their “merged Reference Ontology” and a few of these (UMLS , CyC) have been mentioned as a base for us too :

· USC/ISI: Pangloss Ontology SENSUS approx. 70,000 terms, general coverage, little detail, taxonomization supports Natural Language applications.

· Princeton: WordNet approx. 70,000 terms, general coverage, little detail, taxonomized on Naive Semantics / Cognitive Science principles.

· CYCorp: Upper portion of CYC ontology approx. 2,500 terms, general coverage, little detail, taxonomized on Naive Semantics / AI principles. Later additions may include more of the 40,000-odd terms currently in CYC.

· EDR: Upper portion of EDR concept ontology approx. 1,000 terms, general coverage, medium detail, taxonomized for Natural Language applications. Later additions may include more of the approx. 400,000 terms in the EDR concept lexicon.

· New Mexico State University: MIKROKOSMOS approx. 4,000 terms, general coverage, detailed, taxonomized for Natural Language applications.

· European Union: EuroWordNetóunder construction; probably approx. 50,000 terms, little detail, taxonomized on Naive Semantics / Cognitive Science principles.

· LXT Inc.: UMLS medical ontology exceeds 50,000 terms, medium detail, taxonomized for medical reasoning applications.

Perhaps some of these should also be on our list if they have “matured”.

A last point/issue concerns alignment between our starting sources and how to start on this. Martin Doerr and others did some work reported in “Towards a Core Ontology for Information Integration” and described the comparison and convergence of 2 ontologies using the OntoClean approach. (Guarino, N. and Welty, C., “Evaluating ontological decisions with OntoClean,” Communiations of the ACM, 45 (2), pp. 61-65, 2002,)

This uses analyses of top-level ontological distinctions related to:

1. instantiation versus membership

2. part-of and mereological axioms

3. extensionality

4. connection

5. location and extension

6. co-extension, co-connection

7. unity, singularity and plurality

8. dependence/independence

The claim is that the OntoClean approach “enables: the detection of concept definitions that are lacking in clarity or rigidity; the justification of valid sub-sumption relations; and the detection of invalid sub-sumption declarations. “

Would it be useful to start looking at the match up of some of our “seed” ontologies in this way?

Regards,

Gary Berg-Cross