Information about Database Storage Structures
Database tables/indexes are typically stored in memory or on hard disk in one of many forms, ordered/unordered Flat files, ISAM, Heaps, Hash buckets or B+ Trees. These have various advantages and disadvantages discussed in this topic. The most commonly used are B+trees and ISAM.
Methods
Flat files
Unordered
Unordered storage typically stores the records in the order they are inserted, while having good insertion efficiency, it may seem that it would have inefficient retrieval times, but this is usually never the case as most databases use indexes on the primary keys, resulting in efficient retrieval times.Ordered
Ordered or Linked list storage typically stores the records in order and may have to rearrange or increase the file size in the case a record is inserted, this is very inefficient. However is better for retrieval as the records are pre-sorted (Complexity O(log(n))).Structured files
Heaps
- simplest and most basic method
- insert efficient, records added at end of file – ‘chronological’ order
- retrieval inefficient as searching has to be linear
- deletion – deleted records marked
- advantages
- good for bulk loading data
- good for relatively small relations as indexing overheads are avoided
- good when retrievals involve large proportion of records
- disadvantages
- not efficient for selective retrieval using key values, especially if large
- sorting may be time-consuming
- not suitable for ‘volatile’ tables
Hash buckets
- Hash functions calculate the address of the page in which the record is to be stored based on one or more fields in the record
- Hashing functions chosen to ensure that addresses are spread evenly across the address space
- ‘occupancy’ is generally 40% – 60% of total file size
- unique address not guaranteed so collision detection and collision resolution mechanisms are required
- open addressing
- chained/unchained overflow
- pros and cons
- efficient for exact matches on key field
- not suitable for:
- range retrieval; requires sequential storage
B+ trees
These are the most used in practice.- the time taken to access any tuple is the same because same number of nodes searched
- index is a full index so data file does not have to be ordered
- Pros and cons
- versatile data structure – sequential as well as random access
- access is fast
- supports exact, range, part key and pattern matches efficiently
- ‘volatile’ files are handled efficiently because index is dynamic – expands and contracts as table grows and shrinks
- less well suited to relatively stable files – in this case, ISAM is more efficient
ISAM
See also
Topics in database management systems (DBMS)
| |
|---|---|
|
Concepts Database Database models Database storage Relational model Distributed DBMS ACID Null Relational database Relational algebra Relational calculus Database normalization Referential integrity Relational DBMS Primary key, Foreign key, Surrogate key, Superkey, Candidate key | |
|
Objects Trigger View Table Cursor Log Transaction Index Stored procedure Partition |
Topics in SQL Select Insert Update Merge Delete Join Union Create Drop Begin work Commit Rollback Truncate Alter |
| Implementations of database management systems | |
|
Types of implementations Relational Flat file Deductive Dimensional Hierarchical Object oriented Object relational Temporal XML data stores | |
|
Database products Object-oriented (comparison) Relational (comparison) |
Components Query language Query optimizer Query plan ODBC JDBC |
flat file database describes any of various means to encode a data model (most commonly a table) as a plain text file.
..... Click the link for more information.
Flat files
A flat file is a file that contains records, and in which each record is specified in a single line...... Click the link for more information.
ISAM stands for Indexed Sequential Access Method, a method for storing data for fast retrieval. ISAM was originally developed by IBM for mainframe computers and today forms the basic data store of almost all databases, both relational and otherwise.
..... Click the link for more information.
..... Click the link for more information.
heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B).
..... Click the link for more information.
..... Click the link for more information.
In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number).
..... Click the link for more information.
..... Click the link for more information.
B+ tree (also known as a Quarternary Tree) is a type of tree, which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key.
..... Click the link for more information.
..... Click the link for more information.
flat file database describes any of various means to encode a data model (most commonly a table) as a plain text file.
..... Click the link for more information.
Flat files
A flat file is a file that contains records, and in which each record is specified in a single line...... Click the link for more information.
A data model is an abstract model that describes how data is represented and used.
The term data model has two generally accepted meanings:
..... Click the link for more information.
The term data model has two generally accepted meanings:
- A data model theory i.e. a formal description of how data may be structured and used.
..... Click the link for more information.
table is a set of data elements (values) that is organized using a model of horizontal rows and vertical columns. The columns are identified by name, and the rows are identified by the values appearing in a particular column subset which has been identified as a candidate key.
..... Click the link for more information.
..... Click the link for more information.
plain text is textual material in a computer file which is unformatted and without very much processing readable by simple computer tools such as line printing text commands, in Windows'es DOS window
..... Click the link for more information.
type, and in Unix terminal window cat...... Click the link for more information.
linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of nodes, each containing arbitrary data fields and one or two references ("links") pointing to the next and/or previous nodes.
..... Click the link for more information.
..... Click the link for more information.
ISAM stands for Indexed Sequential Access Method, a method for storing data for fast retrieval. ISAM was originally developed by IBM for mainframe computers and today forms the basic data store of almost all databases, both relational and otherwise.
..... Click the link for more information.
..... Click the link for more information.
B+ tree (also known as a Quarternary Tree) is a type of tree, which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key.
..... Click the link for more information.
..... Click the link for more information.
flat file database describes any of various means to encode a data model (most commonly a table) as a plain text file.
..... Click the link for more information.
Flat files
A flat file is a file that contains records, and in which each record is specified in a single line...... Click the link for more information.
In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number).
..... Click the link for more information.
..... Click the link for more information.
heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B).
..... Click the link for more information.
..... Click the link for more information.
A database management system (DBMS) is computer software designed for the purpose of managing databases. Typical examples of DBMSs include Oracle, DB2, Microsoft Access, Microsoft SQL Server, PostgreSQL, MySQL, FileMaker and Sybase Adaptive Server Enterprise.
..... Click the link for more information.
..... Click the link for more information.
database is a structured collection of records or data that is stored in a computer system so that a computer program or person using a query language can consult it to answer queries. The records retrieved in answer to queries are information that can be used to make decisions.
..... Click the link for more information.
..... Click the link for more information.
A data model is not just a way of structuring data: it also defines a set of operations that can be performed on the data. The relational model, for example, defines operations such as select, project, and join.
..... Click the link for more information.
..... Click the link for more information.
The relational model for database management is a database model based on predicate logic and set theory. It was first formulated and proposed in 1969 by Edgar Codd with aims that included avoiding, without loss of completeness, the need to write computer programs to
..... Click the link for more information.
..... Click the link for more information.
A distributed database management system is a software system that permits the management of a distributed database and makes the distribution transparent to the users. A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network.
..... Click the link for more information.
..... Click the link for more information.
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction.
..... Click the link for more information.
..... Click the link for more information.
Null is a special marker used to indicate that a data value is unknown in the Structured Query Language (SQL). Introduced by the creator of the relational database model, Dr. E.F.
..... Click the link for more information.
..... Click the link for more information.
A relational database is a database that conforms to the relational model, and refers to a database's data and schema (the database's structure of how that data is arranged).
..... Click the link for more information.
..... Click the link for more information.
Relational algebra, an offshoot of first-order logic, is a set of relations closed under operators. Operators operate on one or more relations to yield a relation. Relational algebra is a part of computer science.
..... Click the link for more information.
..... Click the link for more information.
The relational calculus refers to the two calculi, the tuple relational calculus and the domain relational calculus, that are part of the relational model for databases and that provide a declarative way to specify database queries.
..... Click the link for more information.
..... Click the link for more information.
Database normalization is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies.
..... Click the link for more information.
..... Click the link for more information.
referential integrity. In this example, there is a foreign key (artist_id) value in the album table that references a non-existent artist — in other words there is a foreign key value with no corresponding primary key value in the referenced table.
..... Click the link for more information.
..... Click the link for more information.
A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as introduced by E. F. Codd. Relational databases are the most common kind of database in use today (assuming one does not count a file system as a database).
..... Click the link for more information.
..... Click the link for more information.
In relational database design, a unique key or primary key is a candidate key to uniquely identify each row in a table. A unique key or primary key comprises a single column or set of columns.
..... Click the link for more information.
..... Click the link for more information.
In the context of relational databases, a foreign key is a referential constraint between two tables.[1] The foreign key identifies a column or a set of columns in one (referencing) table that refers to a column or set of columns in another (referenced) table.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus