- For information on Wikipedia's database, see Wikipedia:Database.
A
computer database is a
structured collection of records or
data that is stored in a computer system so that a
computer program or person using a
query language can consult it to answer queries. The records retrieved in answer to queries are
information that can be used to make decisions. The computer program used to manage and query a database is known as a
database management system (DBMS). The properties and design of database systems are included in the study of
information science.
A typical query could be to answer questions such as, "How many hamburgers with two or more beef patties were sold in the month of March in New Jersey?". To answer such a question, the database would have to store information about hamburgers sold, including number of patties, sales date, and the region
The term "database" originated within the computing discipline. Although its meaning has been broadened by popular use, even to include non-electronic databases, this article is about computer databases. Database-like collections of information existed well before the
Industrial Revolution in the form of ledgers, sales receipts and other business-related collections of data.
The central concept of a database is that of a collection of records, or pieces of information. Typically, for a given database, there is a structural description of the type of facts held in that database: this description is known as a
schema. The schema describes the objects that are represented in the database, and the relationships among them. There are a number of different ways of organizing a schema, that is, of modelling the database structure: these are known as
database models (or data models). The model in most common use today is the
relational model, which in layman's terms represents all information in the form of multiple related tables each consisting of rows and columns (the true definition uses mathematical terminology). This model represents relationships by the use of values common to more than one table. Other models such as the
hierarchical model and the
network model use a more explicit representation of relationships.
The term
database refers to the collection of related records, and the software should be referred to as the
database management system or DBMS. When the context is unambiguous, however, many database administrators and programmers use the term
database to cover both meanings.
Many professionals consider a collection of data to constitute a database only if it has certain properties: for example, if the data is managed to ensure its integrity and quality, if it allows shared access by a community of users, if it has a schema, or if it supports a
query language. However, there is no definition of these properties that is universally agreed upon.
Database management systems are usually categorized according to the data model that they support: relational, object-relational, network, and so on. The data model will tend to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.
History
The earliest known use of the term
data base was in November 1963, when the
System Development Corporation sponsored a symposium under the title
Development and Management of a Computer-centered Data Base[1].
Database as a single word became common in Europe in the early 1970s and by the end of the decade it was being used in major American newspapers. (
Databank, a comparable term, had been used in the
Washington Post newspaper as early as 1966.)
The first database management systems were developed in the 1960s. A pioneer in the field was
Charles Bachman. Bachman's early papers show that his aim was to make more effective use of the new direct access storage devices becoming available: until then, data processing had been based on
punched cards and
magnetic tape, so that serial processing was the dominant activity. Two key
data models arose at this time:
CODASYL developed the
network model based on Bachman's ideas, and (apparently independently) the
hierarchical model was used in a system developed by
North American Rockwell, later adopted by
IBM as the cornerstone of their
IMS product. While IMS along with the CODASYL
IDMS were the big, high visibility databases developed in the 1960s, several others were also born in that decade, some of which have a significant installed base today. Two worthy of mention are the
PICK and
MUMPS databases, with the former developed originally as an operating system with an embedded database and the latter as a programming language and database for the development of data-based software.
The
relational model was proposed by
E. F. Codd in 1970. He criticized existing models for confusing the abstract description of information structure with descriptions of physical access mechanisms. For a long while, however, the relational model remained of academic interest only. While CODASYL products (IDMS) and network model products (IMS) were conceived as practical engineering solutions taking account of the technology as it existed at the time, the relational model took a much more theoretical perspective, arguing (correctly) that hardware and software technology would catch up in time. Among the first implementations were
Michael Stonebraker's
Ingres at
Berkeley, and the
System R project at IBM. Both of these were research prototypes, announced during 1976. The first commercial products,
Oracle and
DB2, did not appear until around 1980. The first successful database product for microcomputers was
dBASE for the
CP/M and PC-DOS/
MS-DOS operating systems.
During the 1980s, research activity focused on
distributed database systems and
database machines, but these developments had little effect on the market. Another important theoretical idea was the Functional Data Model, but apart from some specialized applications in genetics, molecular biology, and fraud investigation, the world took little notice.
In the 1990s, attention shifted to object-oriented databases. These had some success in fields where it was necessary to handle more complex data than relational systems could easily cope with, such as spatial databases, engineering data (including software engineering
repositories), and multimedia data. Some of these ideas were adopted by the relational vendors, who integrated new features into their products as a result. The 1990s also saw the spread of
Open Source databases, such as
PostgreSQL and
MySQL.
In the 2000s, the fashionable area for innovation is the
XML database. As with object databases, this has spawned a new collection of start-up companies, but at the same time the key ideas are being integrated into the established relational products. XML databases aim to remove the traditional divide between documents and data, allowing all of an organization's information resources to be held in one place, whether they are highly structured or not.
Database models
Various techniques are used to model data structure.
Most database systems are built around one particular data model, although it is increasingly common for products to offer support for more than one model. For any one
logical model various physical implementations may be possible, and most products will offer the user some level of control in tuning the physical implementation, since the choices that are made have a significant effect on performance. An example is the
relational model: all serious implementations of the relational model allow the creation of indexes which provide fast access to rows in a table if the values of certain columns are known.
Flat model
The
flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another.
Hierarchical model
In a
hierarchical model, data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.
Network model
The
network model tends to store records with links to other records. Associations are tracked via "pointers". These pointers can be node numbers or disk addresses. Most network databases tend to also include some form of hierarchical model.
Relational model
Three key terms are used extensively in relational database models:
relations,
attributes, and
domains. A relation is a table with columns and rows. The named columns of the relation are called attributes, and the domain is the set of values the attributes are allowed to take.
The basic data structure of the relational model is the table, where information about a particular entity (say, an employee) is represented in columns and rows (also called
tuples). Thus, the "relation" in "relational database" refers to the various tables in the database; a relation is a set of tuples. The columns enumerate the various attributes of the entity (the employee's name, address or phone number, for example), and a row is an actual instance of the entity (a specific employee) that is represented by the relation. As a result, each tuple of the employee table represents various attributes of a single employee.
All relations (and, thus, tables) in a relational database have to adhere to some basic rules to qualify as relations. First, the ordering of columns is immaterial in a table. Second, there can't be identical tuples or rows in a table. And third, each tuple will contain a single value for each of its attributes.
A relational database contains multiple tables, each similar to the one in the "flat" database model. One of the strengths of the relational model is that, in principle, any value occurring in two different records (belonging to the same table or to different tables), implies a relationship among those two records. Yet, in order to enforce explicit integrity constraints,
relationships between records in tables can also be defined explicitly, by identifying or non-identifying parent-child relationships characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also have a designated single attribute or a set of attributes that can act as a "key", which can be used to uniquely identify each tuple in the table.
A key that can be used to uniquely identify a row in a table is called a primary key. Keys are commonly used to join or combine data from two or more tables. For example, an
Employee table may contain a column named
Location which contains a value that matches the key of a
Location table. Keys are also critical in the creation of indices, which facilitate fast retrieval of data from large tables. Any column can be a key, or multiple columns can be grouped together into a compound key. It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one.
Relational operations
Users (or programs) request data from a relational database by sending it a
query that is written in a special language, usually a dialect of
SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web sites, such as Wikipedia, perform SQL queries when generating pages.
In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted. Often, data from multiple tables are combined into one, by doing a
join. There are a number of relational operations in addition to join.
Normal forms
Relations are classified based upon the types of anomalies to which they're vulnerable. A database that's in the first normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal form.
Object database models
In recent years, the
object-oriented paradigm has been applied to database technology, creating a new programming model known as
object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same
type system as the application program. This aims to avoid the overhead (sometimes referred to as the
impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce the key ideas of object programming, such as
encapsulation and
polymorphism, into the world of databases.
A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program
persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.
Post-relational database models
Several products have been identified as post-relational because the data model incorporates
relations but is not constrained by the
Information Principle, requiring that all information is represented by data values in relations. Products using a post-relational data model typically employ a model that actually pre-dates the
relational model. These might be identified as a directed graph with
trees on the
nodes.
Examples of models that could be classified as post-relational are
PICK aka
MultiValue, and
MUMPS.
Fuzzy databases
It is possible to develop fuzzy
relational databases. Basically, a fuzzy database is a database using
fuzzy logic, for example with fuzzy
attributes, which may be defined as attributes of a item, row or object in a database, which allow to store fuzzy information (imprecise or uncertain data). There are many forms of adding flexibility in fuzzy databases. The simplest technique is to add a fuzzy membership degree to each record, i.e. an attribute in the range [0,1]. However, there are other kinds of databases allowing fuzzy values to be stored in fuzzy attributes using fuzzy sets (including fuzzy spatial datatypes), possibility distributions or fuzzy degrees associated to some attributes and with different meanings (membership degree, importance degree, fulfillment degree...). Sometimes, the expression “fuzzy databases” is used for classical databases with fuzzy queries or with other fuzzy aspects, such as constraints.
The first fuzzy relational database, FRDB, appeared in
Maria Zemankova's dissertation. After, some other models arose like the Buckles-Petry model, the Prade-Testemale Model, the Umano-Fukami model or the GEFRED model by J.M. Medina, M.A. Vila et al. In the context of fuzzy databases, some fuzzy querying languages have been defined, highlighting the SQLf by P. Bosc et al. and the FSQL by J. Galindo et al. These languages define some structures in order to include fuzzy aspects in the
SQL statements, like fuzzy conditions, fuzzy comparators, fuzzy constants, fuzzy constraints, fuzzy thresholds, linguistic labels and so on.
Database internals
Storage and physical database design
Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered
flat files,
ISAM,
heaps,
hash buckets or
B+ trees. These have various advantages and disadvantages discussed further in the main article on this topic. The most commonly used are B+ trees and ISAM.
Other important design choices relate to the clustering of data by category (such as grouping data by month, or location), creating pre-computed views known as materialized views, partitioning data by range or hash. As well memory management and storage topology can be important design choices for database designers. Just as normalization is used to reduce storage requirements and improve the extensibility of the database, conversely denormalization is often used to reduce join complexity and reduce execution time for queries.
[2]
Indexing
All of these databases can take advantage of
indexing to increase their speed, and this technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Typically, indexes are also stored in the various forms of data-structure mentioned above (such as
B-trees,
hashes, and
linked lists). Usually, a specific technique is chosen by the database designer to increase efficiency in the particular case of the type of index required.
Relational DBMSs have the advantage that indexes can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest. In other words, indexes are transparent to the application or end-user querying the database; while they affect performance, any SQL command will run with or without indexes existing in the database.
Relational DBMSs utilize many different algorithms to compute the result of an
SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with
joins are
nested loop join,
sort-merge join and
hash join. Which of these is chosen depends on whether an index exists, what type it is, and its
cardinality.
Transactions and concurrency
In addition to their data model, most practical databases ("transactional databases") attempt to enforce a
database transaction . Ideally, the database software should enforce the
ACID rules, summarized here:
- Atomicity: Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
- Consistency: Every transaction must preserve the integrity constraints — the declared consistency rules — of the database. It cannot place the data in a contradictory state.
- Isolation: Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
- Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes
In practice, many DBMS's allow most of these rules to be selectively relaxed for better performance.
Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only
serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.
Replication
Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time.
The duplicate can be used to improve performance or availability of the whole database system.
Common replication concepts include:
- Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
- Quorum: The result of Read and Write requests are calculated by querying a "majority" of replicas.
- Multimaster: Two or more replicas sync each other via a transaction identifier.
Parallel synchronous replication of databases enables transactions to be replicated on multiple servers simultaneously, which provides a method for backup and security as well as data availability.
Security
Database security denotes the system, processes, and procedures that protect a database from unintended activity.
In the United Kingdom legislation protecting the public from unauthorized disclosure of personal information held on databases falls under the Office of the Information Commissioner. United Kingdom based organizations holding personal data in electronic format (databases for example) are required to register with the Data Commissioner. (reference:
[1])
Locking
Locking as the word it self describes is the act of putting a lock/access restriction on an instance of database which at a particular given instance under use/change. This helps in maintaining integrity of the data which is stored in the DB and also make sure that only one user at a particular instance can make change. Databases can be locked for other reasons also, like access restrictions (read only) for a given level of user.
Databases are also locked for routine database maintenance, again this will prevent someone from making any change during the maintenance and in turn helps to maintain integrity of data once the maintenance operations is over the lock is removed. after which it will be back to normal.
for more detail refer
IBM tutorial for one of there softwares but the concept can is applicable to all the databases in general
Applications of databases
Databases are used in many applications, spanning virtually the entire range of
computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that
application software can use a common
Application Programming Interface to retrieve the information stored in a database. Two commonly used database APIs are
JDBC and
ODBC.
Database development platforms
Notes
1.
^ Swanson, Kenneth (1963-11-08).
Development and Management of a Computer-Centered Database.
dtic.mil. Retrieved on 2007-07-20.
2.
^ S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
References
- S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0123693896
- T. Teorey, S. Lightstone, T. Nadeau, Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
- C. J. Date, An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
- J. Gray, A. Reuter, Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
- David M. Kroenke, Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144
- J. Shih, "Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
- Galindo, J., Urrutia, A., Piattini, M., Fuzzy Databases: Modeling, Design and Implementation (FSQL guide). Idea Group Publishing Hershey, USA, 2006.
- Galindo, J. (Editor). Handbook on Fuzzy Information Processing in Databases. Information Science Reference (an imprint of Idea Group Inc.), Hershey, USA, 2008.
See also
Database may refer to:
- Database, central article discussing the organised collection of data
- Database, a character on The Simpsons
- Bibliographic database, in libraries, a collection of bibliographic information on a particular topic
..... Click the link for more information. Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems.
..... Click the link for more information.
Structure is a fundamental and sometimes intangible notion covering the recognition, observation, nature, and stability of patterns and relationships of entities. From a child's verbal description of a snowflake, to the detailed scientific analysis of the properties of magnetic
..... Click the link for more information.
- For other uses, see Data (disambiguation).
Debt, AIDS, Trade in Africa (or
DATA) is a multinational non-government organization founded in January 2002 in London by U2's Bono along with Bobby Shriver and activists from the Jubilee 2000 Drop
..... Click the link for more information. A computer program is one or more instructions that are intended for execution by a computer. Specifically, it is a symbol or combination of symbols forming an algorithm that may or may not terminate, and that algorithm is written in a programming language.
..... Click the link for more information.
Query languages are computer languages used to make queries into databases and information systems.
Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages. Examples include:
..... Click the link for more information. Information is the result of processing, gathering, manipulating and organizing data in a way that adds to the knowledge of the receiver. In other words, it is the context in which data is taken.
..... Click the link for more information.
A database management system (DBMS) is computer software designed for the purpose of managing databases. Typical examples of DBMSs include Oracle, DB2, Microsoft Access, Microsoft SQL Server, PostgreSQL, MySQL, FileMaker and Sybase Adaptive Server Enterprise.
..... Click the link for more information.
Information science (also information studies) is an interdisciplinary science primarily concerned with the collection, classification, manipulation, storage, retrieval and dissemination of information.
..... Click the link for more information.
Industrial Revolution was a period in the late 18th and early 19th centuries when major changes in agriculture, manufacturing, and transportation had a profound effect on socioeconomic and cultural conditions in Britain and subsequently spread throughout the world, a process that
..... Click the link for more information.
A Logical Schema is a data model of a specific problem domain that is in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either (for example, in 2007) relational tables and columns,
..... Click the link for more information.
A
database model is a theory or specification describing how a database is structured and used. Several such models have been suggested.
Common models include:
- Hierarchical model
- Network model
- Relational model
- Entity-relationship
..... Click the link for more information. The relational model for database management is a database model based on predicate logic and set theory. It was first formulated and proposed in 1969 by Edgar Codd with aims that included avoiding, without loss of completeness, the need to write computer programs to
..... Click the link for more information.
In a hierarchical data model, data are organized into a tree-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent.
..... Click the link for more information.
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its original inventor was Charles Bachman, and it was developed into a standard specification published in 1969 by the CODASYL Consortium.
..... Click the link for more information.
Query languages are computer languages used to make queries into databases and information systems.
Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages. Examples include:
..... Click the link for more information. System Development Corporation (SDC), based in Santa Monica, California, was arguably the world's first computer software company.
SDC started in 1955 as the systems engineering group for the SAGE air defense ground system at the RAND Corporation.
..... Click the link for more information.
Charles W. Bachman
Born November 11 1924 (1924--) (age 84)
Manhattan, Kansas
Nationality American
..... Click the link for more information.
punch card or punched card (or punchcard or Hollerith card or IBM card), is a piece of stiff paper that contains digital information represented by the presence or absence of holes in predefined positions.
..... Click the link for more information.
Magnetic tape is a medium for magnetic recording generally consisting of a thin magnetizable coating on a long and narrow strip of plastic. Nearly all recording tape is of this type, whether used for recording audio or video or for computer data storage.
..... Click the link for more information.
A
data model is an abstract model that describes how data is represented and used.
The term
data model has two generally accepted meanings:
- A data model theory i.e. a formal description of how data may be structured and used.
..... Click the link for more information. CODASYL (often spelt Codasyl) is an acronym for "Conference on Data Systems Languages". This was an IT industry consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers.
..... Click the link for more information.
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its original inventor was Charles Bachman, and it was developed into a standard specification published in 1969 by the CODASYL Consortium.
..... Click the link for more information.
In a hierarchical data model, data are organized into a tree-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent.
..... Click the link for more information.
Rockwell International was the ultimate incarnation of a series of companies under the sphere of influence of Willard Rockwell, who had made his fortune after the invention and successful launch of a new bearing system for truck axles in 1919.
..... Click the link for more information.
International Business Machines Corporation
Public (NYSE: IBM )
Founded 1889, incorporated 1911
Headquarters Armonk, New York, USA
Key people Samuel J.
..... Click the link for more information.
IBM Information Management System (IMS) is a joint hierarchical database and information management system with extensive transaction processing capability.
IBM designed IMS with Rockwell and Caterpillar starting in 1966 for the Apollo program.
..... Click the link for more information.
IDMS (Integrated Database Management System) is a (network) CODASYL database management system first developed at B.F. Goodrich and later marketed by Cullinane Database Systems (renamed Cullinet in 1983).
..... Click the link for more information.
The Pick operating system (often called just "the Pick system" or simply "Pick") is a demand-paged, multiuser, virtual memory, time-sharing operating system based around a unique "multivalued" database. It is used primarily for business data processing.
..... Click the link for more information.
Mumps virusFor other uses of the word MUMPS, see .
Mumps or
epidemic parotitis is a viral disease of humans.
..... Click the link for more information.