Information about Parallel Computing

Parallel computing is the simultaneous execution of some combination of multiple instances of programmed instructions and data on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination. The technique was first put to practical use by ILLIAC IV in 1976, fully a decade after it was conceived.

Definition

A parallel computing system is a computer with more than one processor for parallel processing. In the past, each processor of a multiprocessing system always came in its own processor packaging, but recently-introduced multicore processors contain multiple logical processors in a single package. There are many different kinds of parallel computers. They are distinguished by the kind of interconnection between processors (known as "processing elements" or PEs) and memory. Flynn's taxonomy, one of the most accepted taxonomies of parallel architectures, classifies parallel (and serial) computers according to: whether all processors execute the same instructions at the same time (single instruction/multiple dataSIMD) or whether each processor executes different instructions (multiple instruction/multiple dataMIMD).

One major way to classify parallel computers is based on their memory architectures. Shared memory parallel computers have multiple processors accessing all available memory as global address space. They can be further divided into two main classes based on memory access times: Uniform Memory Access (UMA), in which access times to all parts of memory are equal, or Non-Uniform Memory Access (NUMA), in which they are not. Distributed memory parallel computers also have multiple processors, but each of the processors can only access its own local memory; no global memory address space exists across them. Parallel computing systems can also be categorized by the numbers of processors in them. Systems with thousands of such processors are known as massively parallel. Subsequently there are what are referred to as "large scale" vs. "small scale" parallel processors. This depends on the size of the processor, e.g. a PC based parallel system would generally be considered a small scale system. Parallel processor machines are also divided into symmetric and asymmetric multiprocessors, depending on whether all the processors are the same or not (for instance if only one is capable of running the operating system code and others are less privileged).

A variety of architectures have been developed for parallel processing. For example a Ring architecture has processors linked by a ring structure. Other architectures include hypercubes, fat trees, systolic arrays, and so on.

Theory and practice

Parallel computers can be modelled as Parallel Random Access Machines (PRAMs). The PRAM model ignores the cost of interconnection between the constituent computing units, but is nevertheless very useful in providing upper bounds on the parallel solvability of many problems. In reality the interconnection plays a significant role. The processors may communicate and cooperate in solving a problem or they may run independently, often under the control of another processor which distributes work to and collects results from them (a "processor farm").

Processors in a parallel computer may communicate with each other in a number of ways, including shared (either multiported or multiplexed) memory, a crossbar, a shared bus or an interconnect network of a myriad of topologies including star, ring, tree, hypercube, fat hypercube (a hypercube with more than one processor at a node), an n-dimensional mesh, etc. Parallel computers based on interconnect network need to employ some kind of routing to enable passing of messages between nodes that are not directly connected. The communication medium used for communication between the processors is likely to be hierarchical in large multiprocessor machines. Similarly, memory may be either private to the processor, shared between a number of processors, or globally shared. Systolic array is an example of a multiprocessor with fixed function nodes, local-only memory and no message routing.

Approaches to parallel computers include multiprocessing, parallel supercomputers, NUMA vs. SMP vs. massively parallel computer systems, distributed computing (esp. computer clusters and grid computing). According to Amdahl's law, parallel processing is less efficient than one x-times-faster processor from a computational perspective. However, since power consumption is a super-linear function of the clock frequency on modern processors, we are reaching the point where from an energy cost perspective it can be cheaper to run many low speed processors in parallel than a single highly clocked processor.

Terminology

Some frequently used terms in parallel computing are:
Efficiency: is the execution time using a single processor divided by the quantity of the execution time using a multiprocessor and the number of processors.
Parallel Overhead: the extra work associated with parallel version compared to its sequential code, mostly the extra CPU time and memory space requirements from synchronization, data communications, parallel environment creation and cancellation, etc.
Synchronization: the coordination of simultaneous tasks to ensure correctness and avoid unexpected race conditions.
Speedup: also called parallel speedup, which is defined as wall-clock time of best serial execution divided by wall-clock time of parallel execution. Amdahl's law can be used to give a maximum speedup factor.
Scalability: a parallel system's ability to gain proportionate increase in parallel speedup with the addition of more processors. Also, see this Parallel Computing Glossary
Task: a logically high level, discrete, independent section of computational work. A task is typically executed by a processor as a program

Algorithms

Parallel algorithms can be constructed by redesigning serial algorithms to make effective use of parallel hardware. However, not all algorithms can be parallelized. This is summed up in a famous saying:
One woman can have a baby in nine months, but nine women can't have a baby in one month.
In practice, linear speedup (i.e., speedup proportional to the number of processors) is very difficult to achieve. This is because many algorithms are essentially sequential in nature; this is more formally stated in Amdahl's law. Certain workloads can benefit from pipeline parallelism when extra processors are added. This uses a factory assembly line approach to divide the work. If the work can be divided into n stages where a discrete deliverable is passed from stage to stage, then up to n processors can be used. However, the slowest stage will hold up the other stages so it is rare to be able to fully use n processors.

Parallel problems

Well known parallel software problem sets include embarrassingly parallel and Grand Challenge problems.

Parallel programming

Parallel programming is the design, implementation, and tuning of parallel computer programs which take advantage of parallel computing systems. It also refers to the application of parallel programming methods to existing serial programs (parallelization). Parallel programming focuses on partitioning the overall problem into separate tasks, allocating tasks to processors and synchronizing the tasks to get meaningful results. Parallel programming can only be applied to problems that are inherently parallelizable, mostly without data dependence. A problem can be partitioned based on domain decomposition or functional decomposition, or a combination.

There are two major approaches to parallel programming: implicit parallelism, where the system (the compiler or some other program) partitions the problem and allocates tasks to processors automatically (also called automatic parallelizing compilers); or explicit parallelism, where the programmer must annotate their program to show how it is to be partitioned. Many factors and techniques impact the performance of parallel programming, especially load balancing, which attempts to keep all processors busy by moving tasks from heavily loaded processors to less loaded ones.

Some people consider parallel programming to be synonymous with concurrent programming. Others draw a distinction between parallel programming, which uses well-defined and structured patterns of communications between processes and focuses on parallel execution of processes to enhance throughput, and concurrent programming, which typically involves defining new patterns of communication between processes that may have been made concurrent for reasons other than performance. In either case, communication between processes is performed either via shared memory or with message passing, either of which may be implemented in terms of the other.

Programs which work correctly in a single CPU system may not do so in a parallel environment. This is because multiple copies of the same program may interfere with each other, for instance by accessing the same memory location at the same time. Therefore, careful programming (synchronization) is required in a parallel system.

Parallel programming models



A parallel programming model is a computing architecture and language designed to express parallelism in software systems and applications. The software to support these models include compilers, libraries and other tools that enable the application to use parallel hardware.

Parallel models are implemented in several ways: as libraries invoked from traditional sequential languages, as language extensions, or complete new execution models. They are also roughly categorized for two kinds of systems: shared memory systems and distributed memory systems, though the lines between them are largely blurred nowadays.

References

Further reading

  • Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, Introduction to Parallel Computing (2003), ISBN 0-201-64865-2 (companion book site)
  • Timothy G. Mattson, Beverly A. Sanders, and Berna L Massingill, Patterns for Parallel Computing (2005), ISBN 0-321-22811-1 (companion book site)
  • Barry Wilkinson, Michael Allen, Parallel Programming (2005), ISBN 0-13-140563-2 (companion book site)

External links

Topics in parallel computing      [ e] 
GeneralHigh-performance computing
ParallelismData parallelismTask parallelism
TheorySpeedupAmdahl's lawFlynn's taxonomy (SISD, SIMD, MISD, MIMD) • Cost efficiencyGustafson's lawKarp-Flatt metric
ElementsProcessThreadFiberParallel Random Access Machine
CoordinationMultiprocessingMultithreadingMultitaskingMemory coherencyCache coherencyBarrierSynchronizationDistributed computingGrid computing
ProgrammingProgramming modelImplicit parallelismExplicit parallelism
HardwareComputer clusterBeowulfSymmetric multiprocessingNon-Uniform Memory AccessCache only memory architectureAsymmetric multiprocessingSimultaneous multithreadingShared memoryDistributed memoryMassively parallel processingSuperscalar processingVector processingSupercomputerStream processingGPGPU
SoftwareDistributed shared memoryApplication checkpointingWarewulf
APIsPOSIX threadsOpenMPMessage Passing Interface (MPI)
ProblemsEmbarrassingly parallelGrand ChallengeSoftware lockout


This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.
central processing unit (CPU), or sometimes simply processor, is the component in a digital computer capable of executing a program.(Knott 1974) It interprets computer program instructions and processes data.
..... Click the link for more information.
The ILLIAC IV was one of the most infamous supercomputers ever, destined to be the last in a series of research machines from the University of Illinois. Key to the ILLIAC IV design was fairly high parallelism with up to 256 processors, used to allow the machine to work on large
..... Click the link for more information.
central processing unit (CPU), or sometimes simply processor, is the component in a digital computer capable of executing a program.(Knott 1974) It interprets computer program instructions and processes data.
..... Click the link for more information.
The term multicore can refer to:
  • Multi-core (computing)
  • Any sort of multicore cable
  • An audio multicore cable

..... Click the link for more information.
Flynn's Taxonomy
  Single
Instruction Multiple
Instruction
Single
Data SISD MISD
Multiple
Data SIMD MIMD

Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966.
..... Click the link for more information.
Flynn's Taxonomy
  Single
Instruction Multiple
Instruction
Single
Data SISD MISD
Multiple
Data SIMD MIMD In computing, SIMD (Single Instruction, Multiple D
..... Click the link for more information.
Flynn's Taxonomy
  Single
Instruction Multiple
Instruction
Single
Data SISD MISD
Multiple
Data SIMD MIMD In computing, MIMD (Multiple Instruction stream, Multiple D
..... Click the link for more information.
shared memory is a memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them. Depending on context, programs may run on the same physical processor or on separate ones.
..... Click the link for more information.
Uniform Memory Access (UMA) is a computer memory architecture used in parallel computers having multiple processors and probably multiple memory chips.

All the processors in the UMA model share the physical memory uniformly. Peripherals are also shared.
..... Click the link for more information.
Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory design used in multiprocessors, where the memory access time depends on the memory location relative to a processor.
..... Click the link for more information.
distributed memory refers to a multiple-processor computer system in which each processor has its own private memory. This requires computational tasks to be distributed to the different processors for processing. Afterward the data must be reassembled.
..... Click the link for more information.
Massive parallelism (MP) is a term used in computer architecture, reconfigurable computing, application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) design.
..... Click the link for more information.
Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them.
..... Click the link for more information.
hypercube is an n-dimensional analogue of a square (n = 2) and a cube (n = 3). It is a closed, compact, convex figure whose 1-skeleton consists of groups of opposite parallel line segments aligned in each of the space's dimensions, at right angles to each
..... Click the link for more information.
The fat tree network, invented by Charles E. Leiserson of MIT, is a universal network for provably efficient communication. Unlike an ordinary computer scientist's notion of a tree, which has "skinny" links all over, the links in a fat-tree become "fatter
..... Click the link for more information.
In computer architecture, a systolic array is a pipe network arrangement of data processing units (DPUs (see figure, for instance, with 32 bit wide DPUs). DPUs are similar to central processing units (CPU)s, but do not have a program counter, since operation is transport-triggered,
..... Click the link for more information.
A Parallel Random Access Machine (PRAM) is an abstract machine for designing the algorithms applicable to parallel computers. It eliminates the focus on miscellaneous issues such as synchronization and communication, but lets designer think explicitly about the exploitation
..... Click the link for more information.
A server farm or server cluster is a collection of computer servers usually maintained by an enterprise to accomplish server needs far beyond the capability of one machine.
..... Click the link for more information.
Network topology is the study of the arrangement or mapping of the elements (links, nodes, etc.) of a network, especially the physical (real) and logical (virtual) interconnections between nodes [1] [2] [3].
..... Click the link for more information.
Routing (or routeing) is the process of selecting paths in a network along which to send data or physical traffic. Routing is performed for many kinds of networks, including the telephone network, the Internet, and transport networks.
..... Click the link for more information.
In computer architecture, a systolic array is a pipe network arrangement of data processing units (DPUs (see figure, for instance, with 32 bit wide DPUs). DPUs are similar to central processing units (CPU)s, but do not have a program counter, since operation is transport-triggered,
..... Click the link for more information.
Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them.
..... Click the link for more information.
A supercomputer is a computer that led the world (or was close to doing so) in terms of processing capacity, particularly speed of calculation, at the time of its introduction.
..... Click the link for more information.
Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory design used in multiprocessors, where the memory access time depends on the memory location relative to a processor.
..... Click the link for more information.
Symmetric multiprocessing, or SMP, is a multiprocessor computer architecture where two or more identical processors are connected to a single shared main memory. Most common multiprocessor systems today use an SMP architecture.
..... Click the link for more information.
Distributed computing is a method of computer processing in which different parts of a program run simultaneously on two or more computers that are communicating with each other over a network.
..... Click the link for more information.
A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area
..... Click the link for more information.
Grid computing is a phrase in distributed computing which can have several meanings:
  • A local computer cluster which is like a "grid" because it is composed of multiple nodes.

..... Click the link for more information.
Amdahl's law, named after computer architect Gene Amdahl, is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors.
..... Click the link for more information.
In computer science, efficiency is used to describe several desirable properties of an algorithm or other construct, besides clean design, functionality, etc. Efficiency is generally contained in two properties: speed (the time it takes for an operation to complete), and space (the
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter