Discuss Allocation of fragments in detail.
1 Answer

Data Allocation

  • Deciding where to locate data
  • Allocation strategies:
    • Centralized data allocation
      • Entire database is stored at one site
    • Partitioned data allocation
      • Database is divided into several disjointed parts (fragments) and stored at several sites
  • Replicated data allocation
    • Copies of one or more database fragments are stored at several sites
  • Data distribution over a computer network is achieved through data partition, data replication, oracombination of both

Fragment Allocation/1

  • Fragment allocation problem

    • Given are:

      • fragmentsF=(F1, F2,.. Fl
      • network sitesS(S1, S2,. Sm)
      • and applicationsQ=(q.2..a)
      • Find: the "optimal" distribution ofFtoS

      • Optimality

    • Minimal cost
      • Communication+storage+processing (read and update)
      • Cost in terms of time (usually)
    • Performance
      • Response time and/or throughput
        • Constraints
      • Per site constraints (storage and processing)

Fragment Allocation/2

  • Required information
    • Database Information
      • selectivity of fragments
      • size ofafragment
  • Application Information

    • RR: number of read accesses ofaquery q, toafragmentF
    • UR: number of update accesses of query q, toafragment F,
    • uj:amatrix indicating which queries updates which fragments,
    • rj:a similar matrix for retrievals
    • originating site of each query
  • Site Information
    • USC: unit cost of storing data atasite S.
    • LPC: cost of processing one unit of data atasiteS
  • Network Information
    • communication cost/frame between two sites
    • frame size

Fragment Allocation/3

  • We discuss an allocation model which attempts to

  • minimize the total cost of processing and storage

  • meet certain response time restrictions

  • General Form: $\min$ (Total Cost)

  • subject to

  • response time constraint

  • storage constraint

  • processing constraint

  • Decision variable $x_{\bar{j}}$ $$ x_{i j}= \begin{cases}1 & \text { if fragment } F_{i} \text { is stored at site } S_{j} \\ 0 & \text { otherwise }\end{cases} $$

Fragment Allocation/4

  • The total cost function has two components: storage and query processing. $$ T O C=\sum_{S_{i} \in S} \sum_{F_{j} \in F} S T C_{j k}+\sum_{q \in Q} Q P C_{i} $$

  • Storage cost of fragment $F_{j}$ at site $S_{k}$ : $$ \operatorname{STC}_{j k}=U S C_{k}=\operatorname{size}\left(F_{j}\right)=x_{i j} $$ where $U S C_{k}$ is the unit storage cost at site $k$ - Query processing cost for a query $q_{i}$ is composed of two components: - composed of processing cost (PC) and transmission cost (TC) $Q P C_{i}=P C_{i}+T C_{i}$ **Fragment Allocation/5** - Processing cost is a sum of three components: - access cost (AC), integrity contraint cost (IE), concurency control cost (CC) $$ P C_{i}=A C_{i}+\mathbb{E}_{i}+C C_{i} $$

  • Access cost: $$ A C_{i}=\sum_{s \in S} \sum_{F_{i} F}\left(U R_{i j}+R R_{i j}\right)=x_{i j}=L P C_{k} $$ where $L P C_{k}$ is the unit process cost at site $k$ - Integrity and concurrency costs: - Can be similarly computed, though depends on the specific constraints **Fragment Allocation/6** - The transmission cost is composed of two components: - Cost of processing updates (TCU) and cost of processing retrievals (TCR) $$ T C_{i}=T C U_{i}+T C R_{i} $$

  • Cost of updates:

  • Inform all the sites that have replicas + a short confirmation message back $T C U_{i}=\sum_{S_{u} \in S} \sum_{F_{i} \in F} u_{i j} *$ (update message cost $+$ acknowledgment cost) - Retrieval cost: - Send retrieval request to all sites that have a copy of fragments that are needed + sending back the results from these sites to the originating **Fragment Allocation/7** - Modeling the constraints - Response time constraint for a query $q_{i}$ execution time of $q_{i} \leq \max$. allowable response time for $q_{i}$ - Storage constraints for a site $S_{k}$ $\sum_{F \in F}$ storage requirement of $F_{j}$ at $S_{k} \leq$ storage capacity of $S_{k}$

  • Processing constraints for a site $S_{k}$ $\sum_{q \in O}$ processing load of $q$ at site $S_{k} \leq$ processing capacity of $S_{k}$

Fragment Allocation/8

  • Solution Methods

    • The complexity of this allocation model/problem is NP-complete
    • Correspondence between the allocation problem and similar problems in other areas
      • Plant location problem in operations research
        • Knapsack problem
        • Network flow problem
  • Hence, solutions from these areas can be re-used

  • Use different heuristics to reduce the search space

    • Assume that all candidate partitionings have been determined together with their associated costs and benefits in terms of query processing.
    • The problem is then reduced to find the optimal partitioning and placement for each relation
    • Ignore replication at the first step and find an optimal non-replicated solution
    • Replication is then handeled inasecond step on top of the previous non-replicated solution.
Please log in to add an answer.