Information about Vector Processor
A vector processor, or array processor, is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. This is in contrast to a scalar processor which handles one element at a time. The vast majority of CPUs are scalar (or close to it). Vector processors were common in the scientific computing area, where they formed the basis of most supercomputers through the 1980s and into the 1990s, but general increases in performance and processor design saw the near disappearance of the vector processor as a general-purpose CPU.
Today most commodity CPU designs include some vector processing instructions, typically known as SIMD (Single Instruction, Multiple Data), common examples include SSE and AltiVec. Modern video game consoles and consumer computer-graphics hardware rely heavily on vector processing in their architecture. In 2000, IBM, Toshiba and Sony collaborated to create a Cell processor, consisting of one scalar processor and eight vector processors, for the Sony PlayStation 3.
History
Vector processing was first worked on in the early 1960s at Westinghouse in their Solomon project. Solomon's goal was to dramatically increase math performance by using a large number of simple math co-processors (or ALUs) under the control of a single master CPU. The CPU fed a single common instruction to all of the ALUs, one per "cycle", but with a different data point for each one to work on. This allowed the Solomon machine to apply a single algorithm to a large data set, fed in the form of an array. In 1962 Westinghouse cancelled the project, but the effort was re-started at the University of Illinois as the ILLIAC IV. Their version of the design originally called for a 1 GFLOPS machine with 256 ALUs, but when it was finally delivered in 1972 it had only 64 ALUs and could reach only 100 to 150 MFLOPS. Nevertheless it showed that the basic concept was sound, and when used on data-intensive applications, such as computational fluid dynamics, the "failed" ILLIAC was the fastest machine in the world. It should be noted that the ILLIAC approach of using separate ALUs for each data element is not common to later designs, and is often referred to under a separate category, massively parallel computing.The first successful implementation of vector processing appears to be the CDC STAR-100 and the Texas Instruments Advanced Scientific Computer (ASC). The basic ASC (i.e., "one pipe") ALU used a pipeline architecture which supported both scalar and vector computations, with peak performance reaching approximately 20 MFLOPS, readily achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain. Memory bandwidth was sufficient to support these expanded modes. The STAR was otherwise slower than CDC's own supercomputers like the CDC 7600, but at data related tasks they could keep up while being much smaller and less expensive. However the machine also took considerable time decoding the vector instructions and getting ready to run the process, so it required very specific data sets to work on before it actually sped anything up.
The vector technique was first fully exploited in the famous Cray-1. Instead of leaving the data in memory like the STAR and ASC, the Cray design had eight "vector registers" which held sixty-four 64-bit words each. The vector instructions were applied between registers, which is much faster than talking to main memory. In addition the design had completely separate pipelines for different instructions, for example, addition/subtraction was implemented in different hardware than multiplication. This allowed a batch of vector instructions themselves to be pipelined, a technique they called vector chaining. The Cray-1 normally had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – a respectable number even today.
Other examples followed. CDC tried to re-enter the high-end market again with its ETA-10 machine, but it sold poorly and they took that as an opportunity to leave the supercomputing field entirely. Various Japanese companies (Fujitsu, Hitachi and NEC) introduced register-based vector machines similar to the Cray-1, typically being slightly faster and much smaller. Oregon-based Floating Point Systems (FPS) built add-on array processors for minicomputers, later building their own minisupercomputers. However Cray continued to be the performance leader, continually beating the competition with a series of machines that led to the Cray-2, Cray X-MP and Cray Y-MP. Since then the supercomputer market has focused much more on massively parallel processing rather than better implementations of vector processors. However, recognizing the benefits of vector processing IBM developed Virtual Vector Architecture for use in supercomputers coupling several scalar processors to act as a vector processor.
Today the average computer at home crunches as much data watching a short QuickTime video as did all of the supercomputers in the 1970s. Vector processor elements have since been added to almost all modern CPU designs, although they are typically referred to as SIMD. In these implementations the vector processor runs beside the main scalar CPU, and is fed data from programs that know it is there.
Description
In general terms, CPUs are able to manipulate one or two pieces of data at a time. For instance, many CPU's have an instruction that essentially says "add A to B and put the result in C," while others such as the MOS 6502 require two or three instructions to perform these types of operations.The data for A, B and C could be—in theory at least—encoded directly into the instruction. However things are rarely that simple. In general the data is rarely sent in raw form, and is instead "pointed to" by passing in an address to a memory location that holds the data. Decoding this address and getting the data out of the memory takes some time. As CPU speeds have increased, this memory latency has historically become a large impediment to performance.
In order to reduce the amount of time this takes, most modern CPUs use a technique known as instruction pipelining in which the instructions pass through several sub-units in turn. The first sub-unit reads the address and decodes it, the next "fetches" the values at those addresses, and the next does the math itself. With pipelining the "trick" is to start decoding the next instruction even before the first has left the CPU, in the fashion of an assembly line, so the address decoder is constantly in use. Any particular instruction takes the same amount of time to complete, a time known as the latency, but the CPU can process an entire batch of operations much faster than if it did so one at a time.
Vector processors take this concept one step further. Instead of pipelining just the instructions, they also pipeline the data itself. They are fed instructions that say not just to add A to B, but to add all of the numbers "from here to here" to all of the numbers "from there to there". Instead of constantly having to decode instructions and then fetch the data needed to complete them, it reads a single instruction from memory, and "knows" that the next address will be one larger than the last. This allows for significant savings in decoding time.
To illustrate what a difference this can make, consider the simple task of adding two groups of 10 numbers together. In a normal programming language you would write a "loop" that picked up each of the pairs of numbers in turn, and then added them. To the CPU, this would look something like this:
read the next instruction and decode it fetch this number fetch that number add them put the result here read the next instruction and decode it fetch this number fetch that number add them put the result there
and so on, repeating the base command 10 times over.
But to a vector processor, this task looks considerably different:
read instruction and decode it fetch these 10 numbers fetch those 10 numbers add them put the results here
There are several savings inherent in this approach. For one, only two address translations are needed. Depending on the architecture, this can represent a significant savings in of itself. Another savings is fetching and decoding the instruction itself, which only has to be done one time instead of ten. The code itself is also smaller, which can lead to more efficient memory use.
But more than that, the vector processor typically has some form of superscalar implementation, meaning there is not one part of the CPU adding up those 10 numbers, but perhaps two or four of them. Since the output of a vector command does not rely on the input from any other, those two (for instance) parts can each add five of the numbers, thereby completing the whole operation in half the time.
As mentioned earlier, the Cray implementations took this a step further, allowing several different types of operations to be carried out at the same time. Consider code that adds two numbers and then multiplies by a third; in the Cray these would all be fetched at once, and both added and multiplied in a single operation. Using the pseudocode above, the Cray essentially did:
read instruction and decode it fetch these 10 numbers fetch those 10 numbers fetch another 10 numbers add and multiply them put the results here
The math operations thus completed much faster, the limiting factor being the memory accesses.
Not all problems can be attacked with this sort of solution. Adding these sorts of instructions adds complexity to the core CPU. That complexity typically makes other instructions slower — ie, whenever it is not adding up ten numbers in a row. The more complex instructions also add to the complexity of the decoders, which might slow down the decoding of the more common instructions like normal adding.
In fact they work best only when you have large amounts of data to work on. This is why these sorts of CPUs were found primarily in supercomputers, as the supercomputers themselves were found in places like weather prediction and physics labs, where huge amounts of data exactly like this is "crunched".
central processing unit (CPU), or sometimes simply processor, is the component in a digital computer capable of executing a program.(Knott 1974) It interprets computer program instructions and processes data.
..... Click the link for more information.
..... Click the link for more information.
Scalar processors represent the simplest class of computer processors. [1] A scalar processor processes one data item at a time (typical data items being integers or floating point numbers).
..... Click the link for more information.
..... Click the link for more information.
Computational science (or scientific computing) is the field of study concerned with constructing mathematical models and numerical solution techniques and using computers to analyze and solve scientific, social scientific and engineering problems.
..... Click the link for more information.
..... Click the link for more information.
For other uses, see Supercomputer (disambiguation).
A supercomputer is a computer that led the world (or was close to doing so) in terms of processing capacity, particularly speed of calculation, at the time of its introduction...... Click the link for more information.
Flynn's Taxonomy
Single
Instruction Multiple
Instruction
Single
Data SISD MISD
Multiple
Data SIMD MIMD In computing, SIMD (Single Instruction, Multiple D
..... Click the link for more information.
Single
Instruction Multiple
Instruction
Single
Data SISD MISD
Multiple
Data SIMD MIMD In computing, SIMD (Single Instruction, Multiple D
..... Click the link for more information.
SSE (Streaming SIMD Extensions, originally called ISSE, Internet Streaming SIMD Extensions) is a SIMD (Single Instruction, Multiple Data) instruction set designed by Intel and introduced in 1999 in their Pentium III series
..... Click the link for more information.
..... Click the link for more information.
AltiVec is a floating point and integer SIMD instruction set designed and owned by Apple, IBM and Freescale Semiconductor, formerly the Semiconductor Products Sector of Motorola, (the AIM alliance), and implemented on versions of the PowerPC including Motorola's G4, IBM's G5 and
..... Click the link for more information.
..... Click the link for more information.
video game console is an interactive entertainment computer or electronic device that manipulates the video display signal of a display device (a television, monitor, etc.) to display a game.
..... Click the link for more information.
..... Click the link for more information.
Connects to:
..... Click the link for more information.
- Motherboard via one of
- PCI
- AGP
- PCI Express
- Display via one of
..... Click the link for more information.
International Business Machines Corporation
Public (NYSE: IBM )
Founded 1889, incorporated 1911
Headquarters Armonk, New York, USA
Key people Samuel J.
..... Click the link for more information.
Public (NYSE: IBM )
Founded 1889, incorporated 1911
Headquarters Armonk, New York, USA
Key people Samuel J.
..... Click the link for more information.
Toshiba Corporation
株式会社東芝
Corporation TYO: 6502 , (LSE: TOS )
Founded Tokyo, Japan (1904)
Headquarters Tokyo, Japan
Key people Atsutoshi Nishida, President & CEO
Industry Electronics & engineering
..... Click the link for more information.
株式会社東芝
Corporation TYO: 6502 , (LSE: TOS )
Founded Tokyo, Japan (1904)
Headquarters Tokyo, Japan
Key people Atsutoshi Nishida, President & CEO
Industry Electronics & engineering
..... Click the link for more information.
Sony Corporation
ソニー株式会?
Public (TYO: 6758 ; NYSE: SNE )
Founded May 7 1946 (adopted current name in 1958) by Masaru Ibuka and Akio Morita[1]
Headquarters Minato-ku, Tokyo, Japan[1]
..... Click the link for more information.
ソニー株式会?
Public (TYO: 6758 ; NYSE: SNE )
Founded May 7 1946 (adopted current name in 1958) by Masaru Ibuka and Akio Morita[1]
Headquarters Minato-ku, Tokyo, Japan[1]
..... Click the link for more information.
Cell is a microprocessor architecture jointly developed by a Sony, Toshiba, and IBM, an alliance known as "STI." The architectural design and first implementation were carried out at the STI Design Center in Austin, Texas over a four-year period beginning March 2001 on a budget
..... Click the link for more information.
..... Click the link for more information.
Westinghouse Electric Corporation was an organization founded by George Westinghouse in 1886 as Westinghouse Electric & Manufacturing Company. The company purchased CBS in 1995 and was renamed CBS Corporation in 1997.
..... Click the link for more information.
..... Click the link for more information.
A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating point arithmetic, graphics, signal processing, string processing, or encryption.
..... Click the link for more information.
..... Click the link for more information.
arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and logical operations. The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers.
..... Click the link for more information.
..... Click the link for more information.
central processing unit (CPU), or sometimes simply processor, is the component in a digital computer capable of executing a program.(Knott 1974) It interprets computer program instructions and processes data.
..... Click the link for more information.
..... Click the link for more information.
In mathematics, computing, linguistics, and related disciplines, an algorithm is a finite list of well-defined instructions for accomplishing some task that, given an initial state, will proceed through a well-defined series of successive states, eventually terminating in an
..... Click the link for more information.
..... Click the link for more information.
data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question.
..... Click the link for more information.
..... Click the link for more information.
University of Illinois at Urbana-Champaign (UIUC, U of I, or simply Illinois), is the oldest, largest, and most prestigious campus in the University of Illinois system.
..... Click the link for more information.
..... Click the link for more information.
The ILLIAC IV was one of the most infamous supercomputers ever, destined to be the last in a series of research machines from the University of Illinois. Key to the ILLIAC IV design was fairly high parallelism with up to 256 processors, used to allow the machine to work on large
..... Click the link for more information.
..... Click the link for more information.
For other uses, see Flop.
In computing, FLOPS (or flops or flop/s) is an acronym meaning FLoating point Operations Per Second.
..... Click the link for more information.
Computational fluid dynamics (CFD) is one of the branches of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows.
..... Click the link for more information.
..... Click the link for more information.
Massively parallel is a description which appears in computer science, life science, medical diagnostics, and other fields.
A massively parallel computer is a distributed memory computer system which consists of many individual nodes, each of which is essentially an
..... Click the link for more information.
A massively parallel computer is a distributed memory computer system which consists of many individual nodes, each of which is essentially an
..... Click the link for more information.
The STAR-100 was a supercomputer from Control Data Corporation (CDC), one of the first machines to use a vector processor for improved math performance.
The name STAR was a construct of the words STrings and ARrays.
..... Click the link for more information.
The name STAR was a construct of the words STrings and ARrays.
..... Click the link for more information.
Texas Instruments
Public (NYSE: TXN )
Founded 1930 (as GSI), 1951 (as TI)[1]
Headquarters Dallas, Texas, USA
Key people Tom Engibous, Chairman
Rich Templeton, President & CEO
Kevin March, CFO
Brian Bonner, CIO
..... Click the link for more information.
Public (NYSE: TXN )
Founded 1930 (as GSI), 1951 (as TI)[1]
Headquarters Dallas, Texas, USA
Key people Tom Engibous, Chairman
Rich Templeton, President & CEO
Kevin March, CFO
Brian Bonner, CIO
..... Click the link for more information.
The Advanced Scientific Computer, or ASC, was a supercomputer architecture designed by Texas Instruments (TI) between 1966 and 1973. Key to the ASC's design was a single high-speed shared memory, which was accessed by a number of processors and channel controllers, in a
..... Click the link for more information.
..... Click the link for more information.
Control Data Corporation (CDC), was one of the pioneering supercomputer firms. For most of the 1960s they built the fastest computers in the world by far, only losing that crown in the 1970s to what was effectively a spinoff.
..... Click the link for more information.
..... Click the link for more information.
The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. It was generally about ten times as fast as the 6600, and could deliver about 10 MFLOPS on hand-compiled code.
..... Click the link for more information.
..... Click the link for more information.
The Cray-1 was a supercomputer designed by a team including Seymour Cray for Cray Research. The first Cray-1 system was installed at Los Alamos National Laboratory in 1976, and it went on to become one of the best known and most successful supercomputers in history.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus