Information about Grid Computing

Grid computing is a phrase in distributed computing which can have several meanings:
  • A local computer cluster which is like a "grid" because it is composed of multiple nodes.
  • Offering online computation or storage as a metered commercial service, known as utility computing, computing on demand, or cloud computing.
  • The creation of a "virtual supercomputer" by using spare computing resources within an organization.
  • The creation of a "virtual supercomputer" by using a network of geographically dispersed computers. Volunteer computing, which generally focuses on scientific, mathematical, and academic problems, is the most common application of this technology.
These varying definitions cover the spectrum of "distributed computing", and sometimes the two terms are used as synonyms. This article focuses on distributed computing technologies which are not in the traditional dedicated clusters; otherwise, see computer cluster.

Functionally, one can also speak of several types of grids:
  • Computational grids (including CPU Scavenging grids) which focuses primarily on computationally-intensive operations.
  • Data grids or the controlled sharing and management of large amounts of distributed data.
  • Equipment grids which have a primary piece of equipment e.g. a telescope, and where the surrounding Grid is used to control the equipment remotely and to analyze the data produced.
Enlarge picture
Virtual Organizations accessing different and overlapping sets of resources

Grids versus conventional supercomputers

"Distributed" or "grid computing" in general is a special type of parallel computing which relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network (private, public or the Internet) by a conventional network interface, such as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many CPUs connected by a local high-speed computer bus.

The primary advantage of distributed computing is that each node can be purchased as commodity hardware, which when combined can produce similar computing resources to a many-CPU supercomputer, but at lower cost. This is due to the economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small number of custom supercomputers. The primary performance disadvantage is that the various CPUs and local storage areas do not have high-speed connections. This arrangement is thus well-suited to applications where multiple parallel computations can take place independently, without the need to communicate intermediate results between CPUs.

The high-end scalability of geographically dispersed grids is generally favorable, due to the low need for connectivity between nodes relative to the capacity of the public Internet. Conventional supercomputers also create physical challenges in supplying sufficient electricity and cooling capacity in a single location. Both supercomputers and grids can be used to run multiple parallel computations at the same time, which might be different simulations for the same project, or computations for completely different applications. The infrastructure and programming considerations needed to do this on each type of platform are different, however.

There are also differences in programming and deployment. It can be costly and difficult to write programs so that they can be run in the environment of a supercomputer, which may have a custom operating system, or require the program to address concurrency issues. If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can cause conventional, standalone programs to run on multiple machines (but each given a different part of the same problem). This makes it possible to write and debug programs on a single conventional machine, and eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.

Design considerations and variations

One feature of distributed grids is that they can be formed from computing resources belonging to multiple individuals or organizations (known as multiple administrative domains). This can facilitate commercial transactions, as in utility computing, or make it easier to assemble volunteer computing networks.

One disadvantage of this feature is that the computers which are actually performing the calculations might not be entirely trustworthy. The designers of the system must thus introduce measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results, and from using the system as an attack vector. This often involves assigning work randomly to different nodes (presumably with different owners) and checking that at least two different nodes report the same answer for a given work unit. Discrepancies would identify malfunctioning and malicious nodes.

Due to the lack of central control over the hardware, there is no way to guarantee that nodes will not drop out of the network at random times. Some nodes (like laptops or dialup Internet customers) may also be available for computation but not network communications for unpredictable periods. These variations can be accommodated by assigning large work units (thus reducing the need for continuous network connectivity) and reassigning work units when a given node fails to report its results as expected.

The impacts of trust and availability on performance and development difficulty can influence the choice of whether to deploy onto a dedicated computer cluster, to idle machines internal to the developing organization, or to an open external network of volunteers or contractors.

In many cases, the participating nodes must trust the central system not to abuse the access that is being granted, by interfering with the operation of other programs, mangling stored information, transmitting private data, or creating new security holes. Other systems employ measures to reduce the amount of trust "client" nodes must place in the central system such as placing applications in virtual machines.

Public systems or those crossing administrative domains (including different departments in the same organization) often result in the need to run on heterogeneous systems, using different operating systems and hardware architectures. With many languages, there is a tradeoff between investment in software development and the number of platforms that can be supported (and thus the size of the resulting network). Cross-platform languages can reduce the need to make this tradeoff, though potentially at the expense of high performance on any given node (due to run-time interpretation or lack of optimization for the particular platform).

Various middleware projects have created generic infrastructure, to allow diverse scientific and commercial projects to harness a particular associated grid, or for the purpose of setting up new grids. BOINC is a common one for academic projects seeking public volunteers; more are listed at the end of the article

CPU scavenging

CPU-scavenging, cycle-scavenging, cycle stealing, or shared computing creates a "grid" from the unused resources in a network of participants (whether worldwide or internal to an organization). Typically this technique uses desktop computer instruction cycles that would otherwise be wasted at night, during lunch, or even in the scattered seconds throughout the day when the computer is waiting for user input or slow devices.

Volunteer computing projects use the CPU scavenging model almost exclusively.

In practice, participating computers also donate some supporting amount of disk storage space, RAM, and network bandwidth, in addition to raw CPU power. Since nodes are apt to go "offline" from time to time, as their owners use their resources for their primary purpose, this model must be designed to handle such contingencies.

History

The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid in Ian Foster and Carl Kesselmans seminal work, "The Grid: Blueprint for a new computing infrastructure".

CPU scavenging and volunteer computing were popularized beginning in 1997 by distributed.net and later in 1999 by SETI@home to harness the power of networked PCs worldwide, in order to solve CPU-intensive research problems.

The ideas of the grid (including those from distributed computing, object oriented programming, cluster computing, web services and others) were brought together by Ian Foster, Carl Kesselman and Steve Tuecke, widely regarded as the "fathers of the grid[1]." They led the effort to create the Globus Toolkit incorporating not just computation management but also storage management, security provisioning, data movement, monitoring and a toolkit for developing additional services based on the same infrastructure including agreement negotiation, notification mechanisms, trigger services and information aggregation. While the Globus Toolkit remains the defacto standard for building grid solutions, a number of other tools have been built that answer some subset of services needed to create an enterprise or global grid.

Fastest virtual supercomputers

Current projects and applications



Grids offer a way to solve Grand Challenge problems like protein folding, financial modeling, earthquake simulation, and climate/weather modeling. Grids offer a way of using the information technology resources optimally inside an organization. They also provide a means for offering information technology as a utility for commercial and non-commercial clients, with those clients paying only for what they use, as with electricity or water.

Grid computing is presently being applied successfully by the National Science Foundation's National Technology Grid, NASA's Information Power Grid, Pratt & Whitney, Bristol-Myers Squibb, Co., and American Express.

One of the most famous cycle-scavenging networks is SETI@home, which was using more than 3 million computers to achieve 23.37 sustained teraflops (979 lifetime teraflops) as of September 2001 [1].

As of May 2005, Folding@home had achieved peaks of 186 teraflops on over 160,000 machines.

Another well-known project is distributed.net, which was started in 1997 and has run a number of successful projects in its history.

The NASA Advanced Supercomputing facility (NAS) has run genetic algorithms using the Condor cycle scavenger running on about 350 Sun and SGI workstations.

Until April 27, 2007, United Devices operated the United Devices Cancer Research Project based on its Grid MP product, which cycle scavenges on volunteer PCs connected to the Internet. As of June 2005, the Grid MP ran on about 3,100,000 machines [2].

The Enabling Grids for E-sciencE project, which is based in the European Union and includes sites in Asia and the United States, is a follow up project to the European DataGrid (EDG) and is arguably the largest computing grid on the planet. This, along with the LHC Computing Grid [4] (LCG) have been developed to support the experiments using the CERN Large Hadron Collider. The LCG project is driven by CERN's need to handle huge amounts of data, where storage rates of several gigabytes per second (10 petabytes per year) are required. A list of active sites participating within LCG can be found online[5] as can real time monitoring of the EGEE infrastructure.[6] The relevant software and documentation is also publicly accessible.[7]

Definitions

Today there are many definitions of Grid computing:
  • In his article "What is the Grid? A Three Point Checklist"[8], Ian Foster lists these primary attributes:
  • Computing resources are not administered centrally.
  • Open standards are used.
  • Non-trivial quality of service is achieved.
  • Plaszczak/Wellner[9] define grid technology as "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations."
  • IBM defines grid computing as "the ability, using a set of open standards and protocols, to gain access to applications and data, processing power, storage capacity and a vast array of other computing resources over the Internet. A grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of resources distributed across 'multiple' administrative domains based on their (resources) availability, capacity, performance, cost and users' quality-of-service requirements" [10]
  • An earlier example of the notion of computing as utility was in 1965 by MIT's Fernando Corbató. Fernando and the other designers of the Multics operating system envisioned a computer facility operating "like a power company or water company". http://www.multicians.org/fjcc3.html
  • Buyya defines a grid as "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements".[11]
  • CERN, one of the largest users of grid technology, talk of The Grid: "a service for sharing computer power and data storage capacity over the Internet." [12]
  • Pragmatically, grid computing is attractive to geographically-distributed non-profit collaborative research efforts like the NCSA Bioinformatics Grids such as BIRN: external grids.
  • Grid computing is also attractive to large commercial enterprises with complex computation problems who aim to fully exploit their internal computing power: internal grids.
  • A recent survey (done by Heinz Stockinger in spring 2006; to be published in the Journal of Supercomputing in early 2007) presents a snapshot on the view in 2006.
  • Another survey (done by Miguel L. Bote-Lorenzo et al. in autumn 2002; published in the LNCS series of Springer-Verlag) presents a snapshot on the view in 2002.
Grids can be categorized with a three stage model of departmental grids, enterprise grids and global grids. These correspond to a firm initially utilising resources within a single group i.e. an engineering department connecting desktop machines, clusters and equipment. This progresses to enterprise grids where non-technical staff's computing resources can be used for cycle-stealing and storage. A global grid is a connection of enterprise and departmental grids which can be used in a commercial or collaborative manner.

See also

Concepts and related technology

Alliances and organizations

Production grids

International Grid Projects

  • [http://datatag.web.cern.ch/datatag/ DataTAG] - January 2001 -> January 2003
  • [http://eu-datagrid.web.cern.ch/eu%2Ddatagrid/ European DataGrid] (EDG) - March 2001 -> March 2003
  • Enabling Grids for E-sciencE (EGEE) - March 2004 -> March 2006
  • Enabling Grids for E-sciencE II (EGEE II) - April 2006 -> April 2008
  • Open Middleware Infrastructure Institute Europe (OMII-Europe) - May 2006 -> May 2008

National Grid Projects

  • China Grid Project
  • D-Grid (German)
  • GARUDA (Indian)
  • grid computing project at VECC (Calcutta, India)
  • INFN Grid (Italian)
  • Malaysia National Grid Computing
  • NAREGI Project
  • Singapore National Grid Project
  • Thai National Grid Project

Standards and APIs

Software implementations and middleware

References

Notes

1. ^ Father of the Grid.
2. ^ [3], accessed 4 Jun 2007
3. ^ [4], accessed 23 Sept 2007
4. ^ Large Hadron Collider Computing Grid offical homepage
5. ^ [5]
6. ^ [6]
7. ^ [7]
8. ^ What is the Grid? A Three Point Checklist (pdf).
9. ^ P Plaszczak, R Wellner, Grid computing, 2005, Elsevier/Morgan Kaufmann, San Francisco
10. ^ IBM Solutions Grid for Business Partners: Helping IBM Business Partners to Grid-enable applications for the next phase of e-business on demand.
11. ^ A Gentle Introduction to Grid Computing and Technologies (pdf). Retrieved on 2005-05-06.
12. ^ The Grid Café - What is Grid?. CERN. Retrieved on 2005-02-04.

Bibliography

External links

News & info

Portals and Grid Projects

Articles

Associations and conferences

Past events

Distributed computing is a method of computer processing in which different parts of a program run simultaneously on two or more computers that are communicating with each other over a network.
..... Click the link for more information.
A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area
..... Click the link for more information.
A node is a device that is connected as part of a computer network. For example, a node may be a computer, personal digital assistant, cell phone, router, switch, or hub.
..... Click the link for more information.
Utility computing (also known as on demand computing) is the packaging of computing resources, such as computation and storage, as a metered service similar to a physical public utility (such as electricity, water, natural gas, or telephone network).
..... Click the link for more information.
Cloud computing is a popular phrase that is shorthand for applications that were developed to be rich Internet applications that run on the Internet (or "cloud"). In the cloud computing paradigm, software that is traditionally installed on personal computers is shifted or extended
..... Click the link for more information.
Volunteer computing is a type of distributed computing in which computer owners donate their computing resources (such as processing power and storage) to one or more "projects".
..... Click the link for more information.
A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area
..... Click the link for more information.
The introduction to this article is vague. To comply with Wikipedia's guidelines, it should be improved.
Please discuss this issue on the talk page and read the to make sure the introduction summarizes the article.
..... Click the link for more information.
remote control is an electronic device used for the remote operation of a machine.

The term remote control can be also referred to as "remote" or "controller" when abbreviated.
..... Click the link for more information.
Parallel computing is the simultaneous execution of some combination of multiple instances of programmed instructions and data on multiple processors in order to obtain results faster.
..... Click the link for more information.
Internet is a worldwide, publicly accessible series of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (IP). It is a "network of networks" that consists of millions of smaller domestic, academic, business, and government
..... Click the link for more information.
Ethernet is a family of frame-based computer networking technologies for local area networks (LANs). The name comes from the physical concept of the ether. It defines a number of wiring and signaling standards for the physical layer, through means of network access at the Media
..... Click the link for more information.
A supercomputer is a computer that led the world (or was close to doing so) in terms of processing capacity, particularly speed of calculation, at the time of its introduction.
..... Click the link for more information.
bus (bidirectional universal switch) is a subsystem that transfers data or power between computer components inside a computer or between computers, and a bus typically is controlled by device driver software.
..... Click the link for more information.
Commodity computing is computing done on commodity computers as opposed to supermicrocomputers or boutique computers. Commodity computers are computer systems manufactured by multiple vendors, incorporated components based on open standards.
..... Click the link for more information.
C to C1.]] Economies of scale characterizes a production process in which an increase in the scale of the firm causes a decrease in the long run average cost of each unit.
..... Click the link for more information.
In telecommunications and software engineering, scalability is a desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner, or to be readily enlarged.
..... Click the link for more information.
Concurrency, concurrent, or concurrence may refer to:
  • Concurrence, a legal term referring to the need to prove both actus reus and mens rea

..... Click the link for more information.
An administrative domain is a collection of hosts and routers, and the interconnecting network(s), managed by a single administrative authority.

References


..... Click the link for more information.
Utility computing (also known as on demand computing) is the packaging of computing resources, such as computation and storage, as a metered service similar to a physical public utility (such as electricity, water, natural gas, or telephone network).
..... Click the link for more information.
Volunteer computing is a type of distributed computing in which computer owners donate their computing resources (such as processing power and storage) to one or more "projects".
..... Click the link for more information.
hot Dial-up access is a form of Internet access via telephone line. The client uses a modem connected to a computer and a telephone line to dial into an Internet service provider's (ISP) node to establish a modem-to-modem link, which is then routed to the Internet.
..... Click the link for more information.
A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area
..... Click the link for more information.
Heterogeneous (American English)) means that something (an object or system) consists of a diverse range of different items. It is the antonym of , which means that an object or system consists of many identical items.
..... Click the link for more information.
An operating system (OS) is the software that manages the sharing of the resources of a computer. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the
..... Click the link for more information.
computer architecture is the conceptual design and fundamental operational structure of a computer system. It is a blueprint and functional description of requirements (especially speeds and interconnections) and design implementations for the various parts of a computer —
..... Click the link for more information.
Cross-platform is a term which can refer to computer programs, operating systems, computer languages, programming languages, or other computer software and their implementations which can be made to work on multiple computer platforms.
..... Click the link for more information.
'Middleware is computer software that sits 'in the middle' between application software (e.g. a word processing program) and the operating system (Unix, Windows, z/OS etc.) It is similar to operating system software in that it provides functions to multiple applications, and it is
..... Click the link for more information.
The Berkeley Open Infrastructure for Network Computing (BOINC) is a non-commercial middleware system for volunteer computing, originally developed to support the SETI@home project, but intended to be useful for other applications in areas as diverse as mathematics, medicine,
..... Click the link for more information.
The instruction cycle (also called fetch-and-execute cycle, fetch-decode-execute cycle (FDX) can refer to either the time period during which one instruction is fetched from memory and executed when a computer receives a machine language instruction; or
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter