Basic Parallel Processing Aspects

The parallel processing machines that we deal with in these pages have some common characteristics that play an important role in the selection of the proper algorithm as well as in the efficiency of the final code. These characteristics are presented in this page:

- Each Processing Element (PE) operates under local (and no global) control. This means that each PE executes its local program independently of the other PEs and so each PE has its program stored locally. So each operation performed by each PE depends only on the local data and the data received from the "neighbour" PEs.

- The PEs operate in word level (and not on bit level). Each word can be 8 or 16 bits (or more through software). In most cases we do not relate the word's size with the number of PEs. In some cases though, we may need to take into consideration the number of PEs, as in the case of summing N integer numbers when we need log₂N additional bits. Alternatively we can reduce the number of PEs based on the size of the word!

- During the development of algorithms for parallel machines, each processing unit may or may not have knowledge of the network topology, the network's size, and the position of the PE into the network. The knowledge of the network's topology and size affects the nature of the algorithm. In all the examples that are described into these pages we assume that each program has been developed for a specific network topology and size. When the knowledge of the address (ID) of each Processing Element is considered, there are algorithms that assume that each PE has a discrete and known address, and other algorithms that do not assume such a knowledge. It must be noted that the knowledge of the address requires special treatment when the programs are downloaded to the PEs. The tools that have been developed so far do not support such operation. It is however possible to develop such programs with an increase in the effort for compiling and downloading the code to the processing units.

- The Processing Elements (PE) communicate with each other through a fixed interconnection network. This means that each PE can exchange data with a constant and known number of other processing units (these units are the neighbours of each PE). Note: the reconfigurable networks are a future target.

Basic interconnection networks are the linear array and the binary tree.

- linear array

The following figure gives an example of a linear array:

Linear array with 5 processing elements.

Each of the inner PEs is connected through bi-directional connections with its left and right neighbour. The two outer PEs have only one connection each. The outer PEs can become the Input-Output points for the complete network.

- Binary tree

The following figure gives an example of a binary tree:

Each node is connected to three other nodes: two downwards and one upwards. The upward node is called "parent" and the downward nodes are called "children".

The node that is above all others is called "root" and is connected only with two other nodes (its children) as there is no other node upwards (it has no parent). The nodes that are at the bottom are called "leaves". Each leaf is connected only with one other node (its parent) as there are no other nodes below it (it has no children).

The tree shown in the figure has 7 nodes (N=7) that are placed in 3 levels (k=3). In every binary tree the relation between N and k is N=2^k-1.

The communication between the nodes can be one-way or two-way. The data input-output can be done at the root and/or the leaves of the tree.

Main page of Parallel Processing