JTE : mémoires partagées réparties

Integration of heterogeneous, distributed data sources with Disco
Patrick VALDURIEZ INRIA Rocquencourt

Abstract: The number of heterogeneous data sources that can be accessed from a Web browser is gigantic. But providing integrated access to multiple, distributed data sources on the Web raises several new issues. One is the very wide heterogeneity of the data sources and their computing capabilities, ranging from highly structured databases to unstructured files. Another issue is the need to scale up to high numbers of distributed data sources that can be dynamically added or dropped. A final issue is access performance.

A solution to this problem is to capitalize on distributed database technology with an architecture of information mediators and data source wrappers. Mediators provide a uniform interface to query data sources using a data dictionary while wrappers map the mediator queries into the data source interface. To experiment with such architecture, we have developped the Distributed Information Search COmponents (Disco) project at Inria. Disco is implemented in Java and provides uniform access to heterogeneous data sources via JDBC from a Web browser.

Disco has several novel features. Each Disco wrapper exports its capabilities using a grammar-like description of the operations that the wrapper supports. Disco mediators automatically adapt to the capabilities of wrappers by using an elegant distinction between the preliminary execution plan (that does not consider wrapper capabilities at all) and the final execution plan, which accounts for wrapper capabilities. In addition, wrappers may export cost information that describe the size of the data in the underlying sources and the cost of accessing the sources. Disco mediators use this cost information to perform sophisticated cost-based query optimization.

In this talk, I will describe Disco's architecture and implementation, and recent results on distributed query processing.

Version à imprimerDernière mise à jour : April 18, 2005, at 10:50 AM