The
relational algebra and calculus do
not take the semantics of
terms into account when answering queries.
As a consequence, not
all tuples that should be returned
in response to a query are
always returned, leading to low recall.
In this paper, we propose
the novel notion of a constrained
probabilistic ontology (CPO),
building on the notion of a Probabilistic
Object Base by Eiter et. al. The only
prior known work on probabilistic ontologies is
due to Giugno and Lukasiewicz who proposed it as
a standalone data model rather than associating
it with a database. We developed the concept of
a CPOenhanced relation in which each attribute
of a relation has an associated CPO.
These CPOs describe relationships between
terms occurring in the domain
of that attribute. We show that the
relational algebra can be
extended to handle CPOenhanced relations.
This allows queries to
yield sets of tuples, each of which
has a probability of being
correct. Using the well known Internet
Movie Database (IMDB) as a
testbed, we show that such probabilistic
answers yield a
far higher ``quality of answer'' than
ordinary relational DBs can
provide today. We further develop methods
to automatically infer
probabilistic ontologies from relational
data, and develop
experimental results showing that the
use of probabilistic
ontologies is highly scalable.
The specific contributions of this project are:
 We extend the relational algebra to accomodate
CPOs and formally define notions
of interpretation semiconsistency,
consistency and coherence and show
how consistency checking is generally
NPhard, but can performed in polynomial
time for a certain class of CPOs.
Furthermore, we extend relational
algebra to accomodate CPOs.
 In situations were we need to perform
joins, ontology merging is an important
issue. Our CPO integration algorithm
(CPOmerge) makes
use of interoperation
constraints, a way
to specify how terms in the two
ontologies are related. Furthermore,
we develop the ICI algorithm
that can be used to automatically
infer interoperation constraints.
 We develop several algorithms and methods
from inferring CPOs from existing
ontologies and refning the probabilities
in the CPO. In particular, the CPI algorithm
uses a ground truth dataset to
refine the probabilities in a CPO
when starting off with an arbitrary
distribution.
 We give experimental results for consistency
checking, for the precision and
recall, as well as the quality
of answer (according to two different
metrics) and on the efficiency
and quality of CPO inference. The
experiments show that the CPO enhanced
algebra yields far better results
than the classical relational algebra.
People
Octavian
Udrea, Yu
Deng, Edward
Hung, Nazif
Cihan Tas, V.S. Subrahmanian
Publications
 Octavian Udrea, Yu Deng, Edward Hung and
V.S. Subrahmanian. Probabilistic
Ontologies and Relational Databases.
To appear in Proceedings of ODBASE
2005. [PDF]
Presentations and Demos
 A
video demonstration of an application
of PARQ to the Enron email collection
can be found here (WMV, ~7.5 MB).
 Probabilistic Answers to Relational Queries presented at ODBASE'05 [PPT].
