Projects          People          Publications          Presentations          Videos          Contact
Semantics and Integration
   Probabilistic ontologies

   Annotated RDF



Ontology systems

   RDF views



Related projects and labs


The relational algebra and calculus do not take the semantics of terms into account when answering queries. As a consequence, not all tuples that should be returned in response to a query are always returned, leading to low recall. In this paper, we propose the novel notion of a constrained probabilistic ontology (CPO), building on the notion of a Probabilistic Object Base by Eiter et. al. The only prior known work on probabilistic ontologies is due to Giugno and Lukasiewicz who proposed it as a stand-alone data model rather than associating it with a database. We developed the concept of a CPO-enhanced relation in which each attribute of a relation has an associated CPO.

These CPOs describe relationships between terms occurring in the domain of that attribute. We show that the relational algebra can be extended to handle CPO-enhanced relations. This allows queries to yield sets of tuples, each of which has a probability of being correct. Using the well known Internet Movie Database (IMDB) as a testbed, we show that such probabilistic answers yield a far higher ``quality of answer'' than ordinary relational DBs can provide today. We further develop methods to automatically infer probabilistic ontologies from relational data, and develop experimental results showing that the use of probabilistic ontologies is highly scalable.

The specific contributions of this project are:

  1. We extend the relational algebra to accomodate CPOs and formally define notions of interpretation semi-consistency, consistency and coherence and show how consistency checking is generally NP-hard, but can performed in polynomial time for a certain class of CPOs. Furthermore, we extend relational algebra to accomodate CPOs.
  2. In situations were we need to perform joins, ontology merging is an important issue. Our CPO integration algorithm (CPOmerge) makes use of interoperation constraints, a way to specify how terms in the two ontologies are related. Furthermore, we develop the ICI algorithm that can be used to automatically infer interoperation constraints.
  3. We develop several algorithms and methods from inferring CPOs from existing ontologies and refning the probabilities in the CPO. In particular, the CPI algorithm uses a ground truth dataset to refine the probabilities in a CPO when starting off with an arbitrary distribution.
  4. We give experimental results for consistency checking, for the precision and recall, as well as the quality of answer (according to two different metrics) and on the efficiency and quality of CPO inference. The experiments show that the CPO enhanced algebra yields far better results than the classical relational algebra.


Octavian Udrea, Yu Deng, Edward Hung, Nazif Cihan Tas, V.S. Subrahmanian


  1. Octavian Udrea, Yu Deng, Edward Hung and V.S. Subrahmanian. Probabilistic Ontologies and Relational Databases. To appear in Proceedings of ODBASE 2005. [PDF]

Presentations and Demos

  1. A video demonstration of an application of PARQ to the Enron email collection can be found here (WMV, ~7.5 MB).
  2. Probabilistic Answers to Relational Queries presented at ODBASE'05 [PPT].
Last updated: March 26 2008
Contact us with questions or comments