In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its properties are −
• The sites use very similar software.
• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other sites to process user requests.
• The database is accessed through a single interface as if it is a single database.
There are two types of homogeneous distributed database −
• Autonomous − Each database is independent that functions on its own. They are integrated by a controlling application and use message passing to share data updates.
• Non-autonomous − Data is distributed across the homogeneous nodes and a central or master DBMS co-ordinates data updates across the sites.
In a heterogeneous distributed database, different sites have different operating systems, DBMS products and data models.
Its properties are −
• Different sites use dissimilar schemas and software.
• The system may be composed of a variety of DBMSs like relational, network, hierarchical or object oriented.
• Query processing is complex due to dissimilar schemas.
• Transaction processing is complex due to dissimilar software.
• A site may not be aware of other sites and so there is limited co-operation in processing user requests.
There are two types of heterogeneous distributed database −
• Federated − The heterogeneous database systems are independent in nature and integrated together so that they function as a single database system.
• Un-federated − The database systems employ a central coordinating module through which the databases are accessed.
Architecture of Heterogeneous Database
The underlying DBEs of a heterogeneous system are different in nature and may provide different interfaces to the outside world. One of the first challenges in integrating heterogeneous DBEs is to hide the difference in the interfaces these systems expose.
Wrappers have been used ubiquitously in the industry as the approach for doing this. A wrapper is a software module that uses the open (or the proprietary) interface of an underlying DBE and provides a uniform interface to the outside world based on the capabilities that the DBE provides.
Since the de facto standard for query processing in any heterogeneous database system is SQL, a wrapper exposes a relational model and SQL as the interface for the system is wraps. Depending on the capabilities of the underlying component DBEs, wrappers provide different sets of functionalities.
For instance, a relational DBMS wrapper can support all of the capabilities that a relational DBMS provides. An Excel wrapper can provide the ability to enumerate the rows in an Excel worksheet as tuples in a relation. This ability can be utilized to perform a select on the contents of a worksheet very much like performing a select operation on a relation.
As a result, we can use an Excel wrapper to join the rows in an Excel worksheet with the rows of a table exposed by a relational DBMS wrapper. Because of limitations of the BigBook database, a wrapper for the BigBook database can only provide capability of selecting limited information such as business category and the city where the business is located. The BigBook wrapper is not able to provide the capability to enumerate rows and, therefore, cannot support joins.. Figure below illustrates a wrapper based architecture for a heterogeneous database system.
- As illustrated in Figure, each data source in the system is wrapped by a specific wrapper. Depending on the underlying DBE, a wrapper may be able to provide either tuple level or block level (a set of tuples that are grouped together) access to the information that the database controls.
- The wrapper may also be able to cache information outside the database for faster access. According to Wiederhold, wrappers are not directly used by the client of the heterogeneous database system, but interact with a layer of software called the mediator.
- The mediator does not have the capability to interface directly to the underlying DBE. The mediator can access the global data dictionary to find out the schema of the local DBEs and the functionality they provide.
- The mediator processes the queries that are posted by the global users; determines the location details for each piece of required data for the queries being processed by looking up the details in the GDD; exploits the functionalities that each local DBE provides through its wrapper; and optimizes the queries at the global level. Issues for Query Processing for heterogeneous database There are several issues for query processing in a heterogenous database. • Schema translation Write a wrapper for each data source to translate data to a global schema. Wrappers must also translate updates on global schema to update on local schema • Limited query capabilities Some data sources allow only restricted forms of selection e.g. web forms, flat file data sources. Quires have to be broken up and processed partly at the source and partly at a different site. • Removal of duplicate information when sites have overlapping information Decide which sites to execute query. • Global query optimization