**1 Answer**

written 2.3 years ago by |

**Data Allocation**

- Deciding where to locate data
- Allocation strategies:
- Centralized data allocation
- Entire database is stored at one site

- Partitioned data allocation
- Database is divided into several disjointed parts (fragments) and stored at several sites

- Centralized data allocation
- Replicated data allocation
- Copies of one or more database fragments are stored at several sites

- Data distribution over a computer network is achieved through data partition, data replication, oracombination of both

**Fragment Allocation/1**

Fragment allocation problem

Given are:

- fragmentsF=(F1, F2,.. Fl
- network sitesS(S1, S2,. Sm)
- and applicationsQ=(q.2..a)
Find: the "optimal" distribution ofFtoS

Optimality

- Minimal cost
- Communication+storage+processing (read and update)
- Cost in terms of time (usually)

- Performance
- Response time and/or throughput
- Constraints

- Per site constraints (storage and processing)

- Response time and/or throughput

**Fragment Allocation/2**

- Required information
- Database Information
- selectivity of fragments
- size ofafragment

- Database Information
Application Information

- RR: number of read accesses ofaquery q, toafragmentF
- UR: number of update accesses of query q, toafragment F,
- uj:amatrix indicating which queries updates which fragments,
- rj:a similar matrix for retrievals
- originating site of each query

- Site Information
- USC: unit cost of storing data atasite S.
- LPC: cost of processing one unit of data atasiteS

- Network Information
- communication cost/frame between two sites
- frame size

**Fragment Allocation/3**

We discuss an allocation model which attempts to

minimize the total cost of processing and storage

meet certain response time restrictions

General Form: $\min$ (Total Cost)

subject to

response time constraint

storage constraint

processing constraint

Decision variable $x_{\bar{j}}$ $$ x_{i j}= \begin{cases}1 & \text { if fragment } F_{i} \text { is stored at site } S_{j} \\ 0 & \text { otherwise }\end{cases} $$

**Fragment Allocation/4**

The total cost function has two components: storage and query processing. $$ T O C=\sum_{S_{i} \in S} \sum_{F_{j} \in F} S T C_{j k}+\sum_{q \in Q} Q P C_{i} $$

Storage cost of fragment $F_{j}$ at site $S_{k}$ : $$ \operatorname{STC}_{j k}=U S C_{k}=\operatorname{size}\left(F_{j}\right)=x_{i j} $$ where $U S C_{k}$ is the unit storage cost at site $k$ - Query processing cost for a query $q_{i}$ is composed of two components: - composed of processing cost (PC) and transmission cost (TC) $Q P C_{i}=P C_{i}+T C_{i}$ **Fragment Allocation/5** - Processing cost is a sum of three components: - access cost (AC), integrity contraint cost (IE), concurency control cost (CC) $$ P C_{i}=A C_{i}+\mathbb{E}_{i}+C C_{i} $$

Access cost: $$ A C_{i}=\sum_{s \in S} \sum_{F_{i} F}\left(U R_{i j}+R R_{i j}\right)=x_{i j}=L P C_{k} $$ where $L P C_{k}$ is the unit process cost at site $k$ - Integrity and concurrency costs: - Can be similarly computed, though depends on the specific constraints **Fragment Allocation/6** - The transmission cost is composed of two components: - Cost of processing updates (TCU) and cost of processing retrievals (TCR) $$ T C_{i}=T C U_{i}+T C R_{i} $$

Cost of updates:

Inform all the sites that have replicas + a short confirmation message back $T C U_{i}=\sum_{S_{u} \in S} \sum_{F_{i} \in F} u_{i j} *$ (update message cost $+$ acknowledgment cost) - Retrieval cost: - Send retrieval request to all sites that have a copy of fragments that are needed + sending back the results from these sites to the originating **Fragment Allocation/7** - Modeling the constraints - Response time constraint for a query $q_{i}$ execution time of $q_{i} \leq \max$. allowable response time for $q_{i}$ - Storage constraints for a site $S_{k}$ $\sum_{F \in F}$ storage requirement of $F_{j}$ at $S_{k} \leq$ storage capacity of $S_{k}$

Processing constraints for a site $S_{k}$ $\sum_{q \in O}$ processing load of $q$ at site $S_{k} \leq$ processing capacity of $S_{k}$

**Fragment Allocation/8**

Solution Methods

- The complexity of this allocation model/problem is NP-complete
- Correspondence between the allocation problem and similar
problems in other areas
- Plant location problem in operations research
- Knapsack problem
- Network flow problem

- Plant location problem in operations research

Hence, solutions from these areas can be re-used

Use different heuristics to reduce the search space

- Assume that all candidate partitionings have been determined together with their associated costs and benefits in terms of query processing.
- The problem is then reduced to find the optimal partitioning and placement for each relation
- Ignore replication at the first step and find an optimal non-replicated solution
- Replication is then handeled inasecond step on top of the previous non-replicated solution.