Illustrate the parallel algorithm for matrix multiplication and compare the performance of this algorithm with sequential matrix multiplication algorithm.

52views

written 5.0 years ago by

• modified 5.0 years ago

Parallel algorithm for matrix multiplication-

A = $\begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \\ a_{31} & a_{32} & a_{33} & a_{34} \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix}$

B = $\begin{bmatrix} b_{11} & b_{12} & b_{13} & b_{14} \\ b_{21} & b_{22} & b_{23} & b_{24} \\ b_{31} & b_{32} & b_{33} & b_{34} \\ b_{41} & b_{42} & b_{43} & b_{44} \end{bmatrix}$

A*B = C

C = $\begin{bmatrix} c_{11} & c_{12} & c_{13} & c_{14} \\ c_{21} & c_{22} & c_{23} & c_{24} \\ c_{31} & c_{32} & c_{33} & c_{34} \\ c_{41} & c_{42} & c_{43} & c_{44} \end{bmatrix}$

Step 1: Initialization:

Distribute $i^{th}$ row and $i^{th}$ column of matrix A & B to $i^{th}$ PE.

Step 2: Do N times

a. Take inner product of two vector

b. Rotate vectors of B columns by one PE position

Each inner product generated represents an element of each row of resultant matrix.

Each PE computes inner product of two vectors available to it.

Example: Step 2(1) a: inner product

$PE_{1} $	$PE_{2} $	$PE_{3} $	$PE_{4} $
$a_{11}b_{11} \\ a_{12}b_{21} \\ a_{13}b_{31} \\ a_{14}b_{41} \\ c_{11} $	$a_{21}b_{12} \\ a_{22}b_{22} \\ a_{23}b_{32} \\ a_{24}b_{42} \\ c_{22} $	$a_{31}b_{13} \\ a_{32}b_{23} \\ a_{33}b_{33} \\ a_{34}b_{43} \\ c_{33} $	$a_{41}b_{14} \\ a_{42}b_{24} \\ a_{43}b_{34} \\ a_{44}b_{44} \\ c_{44} $

Step 2(1)b : Rotate column of B left by 1 position.

Step 2(2)a : Inner product

$PE_{1} $	$PE_{2} $	$PE_{3} $	$PE_{4} $
$a_{11}b_{12} \\ a_{12}b_{22} \\ a_{13}b_{32} \\ a_{14}b_{42} $	$a_{21}b_{13} \\ a_{22}b_{23} \\ a_{23}b_{33} \\ a_{24}b_{43} $	$a_{31}b_{14} \\ a_{32}b_{24} \\ a_{33}b_{34} \\ a_{34}b_{44} $	$a_{41}b_{11} \\ a_{42}b_{22} \\ a_{43}b_{33} \\ a_{44}b_{44} $

Results (including previously calculation):

$\begin{bmatrix} c_{11} & c_{22} & c_{33} & c_{44} \\ c_{12} & c_{23} & c_{34} & c_{41} \end{bmatrix}$

Step 2(2)b: Rotate column of B lef by 1 position.

Step 2(3)a: Inner product

$PE_{1} $	$PE_{2} $	$PE_{3} $	$PE_{4} $
$a_{11}b_{14} \\ a_{12}b_{24} \\ a_{13}b_{34} \\ a_{14}b_{44} $	$a_{21}b_{11} \\ a_{22}b_{21} \\ a_{23}b_{31} \\ a_{24}b_{41} $	$a_{31}b_{12} \\ a_{32}b_{22} \\ a_{33}b_{32} \\ a_{34}b_{42} $	$a_{41}b_{13} \\ a_{42}b_{23} \\ a_{43}b_{33} \\ a_{44}b_{43} $

Results:(including previously calculation):

$\begin{bmatrix} c_{11} & c_{22} & c_{33} & c_{44} \\ c_{12} & c_{23} & c_{34} & c_{41} \\ c_{13} & c_{24} & c_{31} & c_{42} \end{bmatrix}$

Step 2(3)b: Rotate column of B left by 1 position.

Step 2(4)a: Inner Product

$PE_{1} $	$PE_{2} $	$PE_{3} $	$PE_{4} $
$a_{11}b_{14} \\ a_{12}b_{24} \\ a_{13}b_{34} \\ a_{14}b_{44} $	$a_{21}b_{11} \\ a_{22}b_{21} \\ a_{23}b_{31} \\ a_{24}b_{41} $	$a_{31}b_{12} \\ a_{32}b_{22} \\ a_{33}b_{32} \\ a_{34}b_{42} $	$a_{41}b_{13} \\ a_{42}b_{23} \\ a_{43}b_{33} \\ a_{44}b_{43} $

Results:(including previously calculation):

$\begin{bmatrix} c_{11} & c_{22} & c_{33} & c_{44} \\ c_{12} & c_{23} & c_{34} & c_{41} \\ c_{13} & c_{24} & c_{31} & c_{42} \\ c_{14} & c_{21} & c_{32} & c_{43} \\ \end{bmatrix}$

Final solution: Step 2(4)b: Rotate column of B left by 1 position

Step 2(5)c: Inner product

$PE_{1} $	$PE_{2} $	$PE_{3} $	$PE_{4} $
$a_{11}b_{11} \\ a_{12}b_{21} \\ a_{13}b_{31} \\ a_{14}b_{41} \\ $	$a_{21}b_{12} \\ a_{22}b_{22} \\ a_{23}b_{32} \\ a_{24}b_{42} \\ $	$a_{31}b_{13} \\ a_{32}b_{23} \\ a_{33}b_{33} \\ a_{34}b_{43} \\ $	$a_{41}b_{14} \\ a_{42}b_{24} \\ a_{43}b_{34} \\ a_{44}b_{44} \\ $

The last routing step is not necessary but if performed shall have the maintain in the original configuration.

The time complexity is-

parallel algorithm is $O(n^2)$

Sequential algorithm is $O(n^2)$

ADD COMMENT EDIT