Information about Distributed Database

A distributed database is a database that is under the control of a central database management system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers.

Collections of data (eg. in a database) can be distributed across multiple physical locations. A distributed database is distributed into separate partitions/fragments. Each partition/fragment of a distributed database may be replicated (ie. redundant fail-overs, RAID like).

Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity.

Basic architecture

A database server is the software managing a database, and a client is an application that requests information from a server. Each computer in a system is a node. A node in a distributed database system acts as a client, a server, or both, depending on the situation.
Horizontal fragments
subsets of tuples (rows) from a relation (table).
Vertical fragments
subsets of attributes (columns) from a relation (table).
Mixed fragment
a fragment which is both horizontally and vertically fragmented, or a logical collection of objects in an ODBMS.
Homogeneous distributed database
uses one DBMS (eg: Objectivity/DB or Oracle).
Heterogeneous distributed database
uses multiple DBMS's (eg: Oracle and MS-SQL and PostgreSQL).


Users access the distributed database through:
Local applications
applications which do not require data from other sites.
Global applications
applications which do require data from other sites.

Important considerations

Care with a distributed database must be taken to ensure that:
  • The distribution is transparent — users must be able to interact with the system as if it were one logical system. This applies to the system's performance, and methods of access amongst other things.
  • Transactions are transparent — each transaction must maintain database integrity across multiple databases. Transactions must also be divided into subtransactions, each subtransaction affecting one database system..

Advantages of distributed databases

  • Reflects organizational structure — database fragments are located in the departments they relate to.
  • Local autonomy — a department can control the data about them (as they are the ones familiar with it.)
  • Improved availability — a fault in one database system will only affect one fragment, instead of the entire database.
  • Improved performance — data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won't affect other modules of the database in a distributed database.)
  • Economics — it costs less to create a network of smaller computers with the power of a single large computer.
  • Modularity — systems can be modified, added and removed from the distributed database without affecting other modules (systems).

Disadvantages of distributed databases

  • Complexity — extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database — for example, joins become prohibitively expensive when performed across multiple systems.
  • Economics — increased complexity and a more extensive infrastructure means extra labour costs.
  • Security — remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (e.g., by encrypting the network links between remote sites).
  • Difficult to maintain integrity — in a distributed database, enforcing integrity over a network may require too much of the network's resources to be feasible.
  • Inexperience — distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice.

See also

References

  • [Hussain AL-Mout]
  • M. T. Ozsu and P. Valduriez, Principles of Distributed Databases (2nd edition), Prentice-Hall, ISBN 0-13-659707-6
  • Federal Standard 1037C
  • Elmasri and Navathe, Fundamentals of database systems (3rd edition), Addison-Wesley Longman, ISBN 0-201-54263-3
database is a structured collection of records or data that is stored in a computer system so that a computer program or person using a query language can consult it to answer queries. The records retrieved in answer to queries are information that can be used to make decisions.
..... Click the link for more information.
A database management system (DBMS) is computer software designed for the purpose of managing databases. Typical examples of DBMSs include Oracle, DB2, Microsoft Access, Microsoft SQL Server, PostgreSQL, MySQL, FileMaker and Sybase Adaptive Server Enterprise.
..... Click the link for more information.
Computer data storage, computer memory, and often casually storage or memory refer to computer components, devices and recording media that retain digital data used for computing for some interval of time.
..... Click the link for more information.
central processing unit (CPU), or sometimes simply processor, is the component in a digital computer capable of executing a program.(Knott 1974) It interprets computer program instructions and processes data.
..... Click the link for more information.
as a college campus, industrial complex, or a military base. A CAN, may be considered a type of MAN (metropolitan area network), but is generally limited to an area that is smaller than a typical MAN.
..... Click the link for more information.
Database normalization is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies.
..... Click the link for more information.
Raid or RAID may refer to:
  • Redundant Array of Independent/Inexpensive Disks, or RAID, a system of multiple hard drives for sharing or replicating data.

..... Click the link for more information.
Objectivity/DB is a commercial object oriented database management system produced by Objectivity, Inc. It allows applications to make standard C, C++, Java, Python or Smalltalk objects persistent without having to convert the data objects into the rows and columns used by a
..... Click the link for more information.
Oracle Database (commonly referred to as Oracle RDBMS or simply as Oracle), a relational database management system (RDBMS) software product released by Oracle Corporation, has become a major feature of database computing.
..... Click the link for more information.
Microsoft SQL Server is a relational database management system (RDBMS) produced by Microsoft. Its primary query language is Transact-SQL, an implementation of the ANSI/ISO standard Structured Query Language (SQL) used by both Microsoft and Sybase.
..... Click the link for more information.
PostgreSQL is an object-relational database management system (ORDBMS). It is released under a BSD-style license and is thus free software. As with many other open-source programs, PostgreSQL is not controlled by any single company, but relies on a global community of developers
..... Click the link for more information.
A database administrator (DBA) is a person who is responsible for the environmental aspects of a database. In general, these include:
  • Recoverability - Creating and testing Backups
  • Integrity - Verifying or helping to verify data integrity

..... Click the link for more information.
A distributed database management system is a software system that permits the management of a distributed database and makes the distribution transparent to the users. A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network.
..... Click the link for more information.
Federal Standard 1037C, entitled Telecommunications: Glossary of Telecommunication Terms is a United States Federal Standard, issued by the General Services Administration pursuant to the Federal Property and Administrative Services Act of 1949, as amended.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter