0
14kviews
Explain the phases of query processing in distributed database.
4
1.6kviews

## Layers of Query Processing

Query processing has 4 layers:

• Query Decomposition

• Data Localization

• Global Query Optimization

• Distribution Query Execution

## Query Decomposition

The first layer decomposes the calculus query into an algebraic query on global relations. The information needed for this transformation is found in the global conceptual schema describing the global relations.

• Query decomposition can be viewed as four successive steps.

• Normalization

• Analysis

• Simplification

• Restructure

• First, the calculus query is rewritten in a normalized form that is suitable for subsequent manipulation. Normalization of a query generally involves the manipulation of the query quantifiers and of the query qualification by applying logical operator priority.

• Second, the normalized query is analyzed semantically so that incorrect queries are detected and rejected as early as possible. Techniques to detect incorrect queries exist only for a subset of relational calculus. Typically, they use some sort of graph that captures the semantics of the query.

•Third, the correct query (still expressed in relational calculus) is simplified. One way to simplify a query is to eliminate redundant predicates. Note that redundant queries are likely to arise when a query is the result of system transformations applied to the user query. such transformations are used for performing semantic data control (views, protection, and semantic integrity control).

•Fourth, the calculus query is restructured as an algebraic query. The traditional way to do this transformation toward a "“better" algebraic specification is to start with an initial algebraic query and transform it in order to find a "go

•The algebraic query generated by this layer is good in the sense that the PR152 yorse executions are typically avoided.

Query Processing Example

Query:

select salary
from instructor
where salary < 75000;


This query can be translated into either of the following relational-algebra expressions:

• $\sigma_{\text {salary }\lt75000}\left(\Pi_{\text {salary }}(\right.$ instructor $\left.)\right)$
• $\Pi_{\text {salary }}\left(\sigma_{\text {salary }\lt75000}(\right.$ instructor $\left.)\right)$

## Data Localization

• The input to the second layer is an algebraic query on global relations. The main role of the second layer is to localize the query's data using data distribution information in the fragment schema.

• This layer determines which fragments are involved in the query and transforms the distributed query into a query on fragments.

• A global relation can be reconstructed by applying the fragmentation rules, and then deriving a program, called a localization program, of relational algebra operators, which then act on fragments.

Generating a query on fragments is done in two steps

• First, the query is mapped into a fragment query by substituting each relation by its reconstruction program (also called materialization program).

• Second, the fragment query is simplified and restructured to produce another "good" query.

## Global Query Optimization

• The input to the third layer is an algebraic query on fragments. The goal of query optimization is to find an execution strategy for the query which is close to optimal.

• The previous layers have already optimized the query, for example, by eliminating redundant expressions. However, this optimization is independent of fragment characteristics such as fragment allocation and cardinalities.

• Query optimization consists of finding the "best" ordering of operators in the query, including communication operators that minimize a cost function.

• The output of the query optimization layer is a optimized algebraic query with communication operators included on fragments. It is typically represented and saved (for future executions) as a distributed query execution plan.

## Distribution Query Execution

• The last layer is performed by all the sites having fragments involved in the query.

• Each sub queryexecuting atone site, called a local query, is then optimized using the local schema of the site and executed.