Information about Relational Algebra
Relational algebra, an offshoot of first-order logic, is a set of relations closed under operators. Operators operate on one or more relations to yield a relation. Relational algebra is a part of computer science.
Relation algebra in pure mathematics is an algebraic structure, relevant to mathematical logic and set theory.
Because a relation is interpreted as the extension of some predicate, each operator of a relational algebra has a counterpart in predicate calculus. For example, the natural join is a counterpart of logical AND (
). If relations R and S represent the extensions of predicates p1 and p2, respectively, then the natural join of R and S (R
S) is a relation representing the extension of the predicate p1
p2.
The exact set of operators may differ per definition and also depends on whether the unlabeled relational model (that uses mathematical relations) or the labeled relational model (that uses the labeled generalization of mathematical relations) is used. We will assume the labeled case here as this was the kind that Codd proposed and is thought by some to have been his most important innovation, as it eliminates dependence on an ordering to the attributes of a relation. Under this model we assume that tuples are partial functions from attribute names to values. The attribute a of a tuple t is denoted in this article as t(a).
It is important to realise that Codd's algebra is not in fact complete with respect to first-order logic. Had it been so, certain insurmountable computational difficulties would have arisen for any implementation of it. To overcome these difficulties, he restricted the operands to finite relations only and also proposed restricted support for negation (NOT) and disjunction (OR). Analogous restrictions are found in many other logic-based computer languages. Codd defined the term relational completeness to refer to a language that is complete with respect to first-order predicate calculus apart from the restrictions he proposed. In practice the restrictions have no adverse effect on the applicability of his relational algebra for database purposes.
The six primitive operators of Codd's algebra are the selection, the projection, the Cartesian product (also called the cross product or cross join), the set union, the set difference, and the rename. (Actually, Codd omitted the rename, but the compelling case for its inclusion was shown by the inventors of ISBL.) These six operators are fundamental in the sense that none of them can be omitted without losing expressive power. Many other operators have been defined in terms of these six. Among the most important are set intersection, division, and the natural join. In fact ISBL made a compelling case for replacing the Cartesian product by the natural join, of which the Cartesian product is a degenerate case.
Altogether, the operators of relational algebra have identical expressive power to that of domain relational calculus or tuple relational calculus. However, for the reasons given in the Introduction above, relational algebra has strictly less expressive power than that of first-order predicate calculus without function symbols. Relational algebra actually corresponds to a subset of first-order logic that is Horn clauses without recursion and negation.
The Cartesian product is defined differently from the one defined in set theory in the sense that tuples are considered to be 'shallow' for the purposes of the operation. That is, unlike in set theory, where the Cartesian product of a n-tuple by an m-tuple is a set of 2-tuples, the Cartesian product in relational algebra has the 2-tuple "flattened" into an n+m-tuple. More formally, R × S is defined as follows:
where
is a set of attribute names. The result of such projection is defined as the set that is obtained when all tuples in
are restricted to the set
.
where
is a propositional formula that consists of atoms as allowed in the normal selection and the logical operators
(and),
(or) and
(negation). This selection selects all those tuples in
for which
holds.
except that the
field in all tuples is renamed to an
field.This simply used to rename the attribute of a relation or the relation itself.
S where R and S are relations. The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For an example consider the tables Employee and Dept and their natural join:
Join is another term for relation composition; in category theory, the join is precisely the fiber product. In Unicode, the bowtie symbol is ⋈ (U+22C8).
The natural join is arguably one of the most important operators since it is the relational counterpart of logical AND. Note carefully that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value. In particular, natural join allows the combination of relations that are associated by a foreign key. For example, in the above example a foreign key probably holds from Employee.DeptName to Dept.DeptName and then the natural join of Employee and Dept combines all employees with their departments. Note that this works because the foreign key holds between attributes with the same name. If this is not the case such as in the foreign key from Dept.manager to Emp.emp-number then we have to rename these columns before we take the natural join. Such a join is sometimes also referred to as an equijoin (see θ-join).
More formally the semantics of the natural join is defined as follows:
where fun(r) is a predicate that is true for a binary relation r iff r is a functional binary relation. It is usually required that R and S must have at least one common attribute, but if this constraint is omitted then in that special case the natural join becomes exactly the Cartesian product as defined above.
The natural join can be simulated with Codd's primitives as follows. Assume that b1,...,bm are the attribute names common to R, S, a1,...,an are the attribute names unique to R and c1,...,ck are the attribute unique to S. Furthermore assume that the attribute names d1,...,dm are neither in R nor in S. In a first step we can now rename the common attribute names in S: : S' := ρd1/b1(...ρdm/bm( S)...) Then we take the Cartesian product and select the tuples that are to be joined: : T := σb1=d1(...σbm=dm(R × S')...) Finally we take a projection to get rid of the renamed attributes: : U := πa1,...,an,b1,...,bm,c1,...,ck(T)
If we want to combine tuples from two relations where the combination condition is not simply the equality of shared attributes then it is convenient to have a more general form of join operator, which is the θ-join (or theta-join). The θ-join is a binary operator that is written as or where a and b are attribute names, θ is a binary relation in the set {<, ≤, =, >, ≥}, v is a value constant, and R and S are relations. The result of this operation consists of all combinations of tuples in R and S that satisfy the relation θ. The result of the θ-join is defined only if the headers of S and R are disjoint, that is, do not contain a common attribute.
The simulation of this operation in the fundamental operations is therefore as follows:
In case the operator θ is the equality operator (=) then this join is also called an equijoin.
Note, however, that a computer language that supports the natural join and rename operators does not need θ-join as well, as this can be achieved by selection from the result of a natural join (which degenerates to Cartesian product when there are no shared attributes).
More formally the semantics of the semijoin is defined as follows:
where fun(r) is as in the definition of natural join.
The semijoin can be simulated using the natural join as follows. Assume that a1,...,an are the attribute names of R, then it holds that:
For an example consider the tables Employee and Dept and their antijoin:
The antijoin is formally defined as follows:
or
where fun(r) is as in the definition of natural join.
The antijoin can also be defined as the complement of the semijoin, as follows:
If DBProject contains all the tasks of the Database project then the result of the division above contains exactly all the students that have completed the Database project.
More formally the semantics of the division is defined as follows:
where {a1,...,an} is the set of attribute names unique to R and t[a1,...,an] is the restriction of t to this set. It is usually required that the attribute names in the header of S are a subset of those of R because otherwise the result of the operation will always be empty.
The simulation of the division with the basic operations is as follows. We assume that a1,...,an are the attribute names unique to R and b1,...,bm are the attribute names of S. In the first step we project R on its unique attribute names and construct all combinations with tuples in S:
In the prior example, T would represent a table such that every Student (because Student is the unique key / attribute of the Completed table) is combined with every given Task. So Eugene, for instance, would have two rows, Eugene -> Database1 and Eugene -> Database2 in T.
In the next step we subtract R from this relation:
Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuple in one of the operands by "fill" values for each of the attributes of the other operand.
The operators defined in this section assume the existence of a null value, ω, which we do not define, to be used for the fill values. It should not be assumed that this is the NULL defined for the database language SQL, nor should it be assumed that ω is a mark rather than a value, nor should it be assumed that the controversial three-valued logic is introduced by it.
Three outer join operators are defined: left outer join, right outer join, and full outer join. (The word "outer" is sometimes omitted.)
For an example consider the tables Employee and Dept and their left outer join:
In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take a null value, ω.
Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting relation where tuples in DeptName have tuples of Finance or Executive.
R
(R
S) means T
(R
S)
where T = RxN
where N = "A relation with the sames attributes of S minus the attributes of R, with an only tuple with all its attributes set to NULL"
The left outer join can be simulated using the natural join and set union as follows:
The right outer join is written as R X= S where R and S are relations. The result of the right outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R.
For example consider the tables Employee and Dept and their right outer join:
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω.
Since there are no tuples in Employee with a DeptName of Production, ωs occur in the Name attribute of the resulting relation where tuples in DeptName had tuples of Production.
The right outer join can be simulated using the natural join and set union as follows:
The full outer join is written as R =X= S where R and S are relations. The result of the full outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.
For an example consider the tables Employee and Dept and their full outer join:
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω. Tuples in S which have no common values in common attribute names with tuples in R also take a null value, ω.
The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as follows:
It can be proven that there is no relational algebra expression E(R) taking R as a variable argument which produces R+. The proof is based on the fact that, given a relational expression E for which it is claimed that E(R) = R+, where R is a variable, we can always find an instance r of R (and a corresponding domain d) such that E(r) ≠ r+.[1]
Here we present a set of rules, that can be used in such transformations.
and
rows, the result will contain
rows. Therefore it is very important to do our best to decrease the size of both operands before applying the cross product operator.
This can be effectively done, if the cross product is followed by a selection operator, e.g. (
×
). Considering the definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try to push down a selection from higher levels of the expression tree using the other selection rules.
In the above case we break up condition
into conditions
,
and
using the split rules about complex selection conditions, so that
=
and
only contains attributes from
,
contains attributes only from
and
contains the part of
that contains attributes from both
and
. Note, that
,
or
are possibly empty. Then the following holds:
In mathematics, a tuple is a finite sequence (also known as an "ordered list") of objects, each of a specified type. A tuple containing n objects is known as an "n-tuple".
..... Click the link for more information.
Relation algebra in pure mathematics is an algebraic structure, relevant to mathematical logic and set theory.
Introduction
Relational algebras received little attention until the publication of E.F. Codd's relational model of data in 1970. Codd proposed such an algebra as a basis for database query languages. The first query language to be based on Codd's algebra was ISBL, and this pioneering work has been acclaimed by many authorities as having shown the way to make Codd's idea into a useful language. Business System 12 was a short-lived industry-strength relational DBMS that followed the ISBL example. In 1998 Chris Date and Hugh Darwen proposed a language called Tutorial D intended for use in teaching relational database theory, and its query language also draws on ISBL's ideas. Rel is an implementation of Tutorial D. Even the query language of SQL is loosely based on a relational algebra, though the operands in SQL (tables) are not exactly relations and several useful theorems about the relational algebra do not hold in the SQL counterpart (arguably to the detriment of optimisers and/or users).Because a relation is interpreted as the extension of some predicate, each operator of a relational algebra has a counterpart in predicate calculus. For example, the natural join is a counterpart of logical AND (
). If relations R and S represent the extensions of predicates p1 and p2, respectively, then the natural join of R and S (R
S) is a relation representing the extension of the predicate p1
p2.
The exact set of operators may differ per definition and also depends on whether the unlabeled relational model (that uses mathematical relations) or the labeled relational model (that uses the labeled generalization of mathematical relations) is used. We will assume the labeled case here as this was the kind that Codd proposed and is thought by some to have been his most important innovation, as it eliminates dependence on an ordering to the attributes of a relation. Under this model we assume that tuples are partial functions from attribute names to values. The attribute a of a tuple t is denoted in this article as t(a).
It is important to realise that Codd's algebra is not in fact complete with respect to first-order logic. Had it been so, certain insurmountable computational difficulties would have arisen for any implementation of it. To overcome these difficulties, he restricted the operands to finite relations only and also proposed restricted support for negation (NOT) and disjunction (OR). Analogous restrictions are found in many other logic-based computer languages. Codd defined the term relational completeness to refer to a language that is complete with respect to first-order predicate calculus apart from the restrictions he proposed. In practice the restrictions have no adverse effect on the applicability of his relational algebra for database purposes.
Primitive operations
As in any algebra, some operators are primitive and the others, being definable in terms of the primitive ones, are derived. It is useful if the choice of primitive operators parallels the usual choice of primitive logical operators. Although it is well known that the usual choice in logic of AND, OR and NOT is somewhat arbitrary, Codd made a similar arbitrary choice for his algebraThe six primitive operators of Codd's algebra are the selection, the projection, the Cartesian product (also called the cross product or cross join), the set union, the set difference, and the rename. (Actually, Codd omitted the rename, but the compelling case for its inclusion was shown by the inventors of ISBL.) These six operators are fundamental in the sense that none of them can be omitted without losing expressive power. Many other operators have been defined in terms of these six. Among the most important are set intersection, division, and the natural join. In fact ISBL made a compelling case for replacing the Cartesian product by the natural join, of which the Cartesian product is a degenerate case.
Altogether, the operators of relational algebra have identical expressive power to that of domain relational calculus or tuple relational calculus. However, for the reasons given in the Introduction above, relational algebra has strictly less expressive power than that of first-order predicate calculus without function symbols. Relational algebra actually corresponds to a subset of first-order logic that is Horn clauses without recursion and negation.
Set operators
Although three of the six basic operators are taken from set theory, there are additional constraints that are present in their relational algebra counterparts: For set union and set difference, the two relations involved must be union-compatible—that is, the two relations must have the same set of attributes. As set intersection can be defined in terms of set difference, the two relations involved in set intersection must also be union-compatible.The Cartesian product is defined differently from the one defined in set theory in the sense that tuples are considered to be 'shallow' for the purposes of the operation. That is, unlike in set theory, where the Cartesian product of a n-tuple by an m-tuple is a set of 2-tuples, the Cartesian product in relational algebra has the 2-tuple "flattened" into an n+m-tuple. More formally, R × S is defined as follows:
- R
S = {r
s| r
R, s
S}
Projection
where
is a set of attribute names. The result of such projection is defined as the set that is obtained when all tuples in
are restricted to the set
.
Selection
where
is a propositional formula that consists of atoms as allowed in the normal selection and the logical operators
(and),
(or) and
(negation). This selection selects all those tuples in
for which
holds.
Rename
except that the
field in all tuples is renamed to an
field.This simply used to rename the attribute of a relation or the relation itself.
Joins and join-like operators
Natural join
Natural join is a binary operator that is written as R
S where R and S are relations. The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. For an example consider the tables Employee and Dept and their natural join:
|
|
|
Join is another term for relation composition; in category theory, the join is precisely the fiber product. In Unicode, the bowtie symbol is ⋈ (U+22C8).
The natural join is arguably one of the most important operators since it is the relational counterpart of logical AND. Note carefully that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value. In particular, natural join allows the combination of relations that are associated by a foreign key. For example, in the above example a foreign key probably holds from Employee.DeptName to Dept.DeptName and then the natural join of Employee and Dept combines all employees with their departments. Note that this works because the foreign key holds between attributes with the same name. If this is not the case such as in the foreign key from Dept.manager to Emp.emp-number then we have to rename these columns before we take the natural join. Such a join is sometimes also referred to as an equijoin (see θ-join).
More formally the semantics of the natural join is defined as follows:
- R
S = { t
s : t
R, s
S, fun (t
s) }
where fun(r) is a predicate that is true for a binary relation r iff r is a functional binary relation. It is usually required that R and S must have at least one common attribute, but if this constraint is omitted then in that special case the natural join becomes exactly the Cartesian product as defined above.
The natural join can be simulated with Codd's primitives as follows. Assume that b1,...,bm are the attribute names common to R, S, a1,...,an are the attribute names unique to R and c1,...,ck are the attribute unique to S. Furthermore assume that the attribute names d1,...,dm are neither in R nor in S. In a first step we can now rename the common attribute names in S: : S' := ρd1/b1(...ρdm/bm( S)...) Then we take the Cartesian product and select the tuples that are to be joined: : T := σb1=d1(...σbm=dm(R × S')...) Finally we take a projection to get rid of the renamed attributes: : U := πa1,...,an,b1,...,bm,c1,...,ck(T)
θ-join and equijoin
Consider tables Car and Boat which list models of cars and boats and their respective prices. Suppose a customer wants to buy a car and a boat, but she doesn't want to spend more money for the boat than for the car. The θ-join on the relation CarPrice ≥ BoatPrice produces a table with all the possible options.
|
|
|
If we want to combine tuples from two relations where the combination condition is not simply the equality of shared attributes then it is convenient to have a more general form of join operator, which is the θ-join (or theta-join). The θ-join is a binary operator that is written as or where a and b are attribute names, θ is a binary relation in the set {<, ≤, =, >, ≥}, v is a value constant, and R and S are relations. The result of this operation consists of all combinations of tuples in R and S that satisfy the relation θ. The result of the θ-join is defined only if the headers of S and R are disjoint, that is, do not contain a common attribute.
The simulation of this operation in the fundamental operations is therefore as follows:
- R
φ S = σφ(R × S)
In case the operator θ is the equality operator (=) then this join is also called an equijoin.
Note, however, that a computer language that supports the natural join and rename operators does not need θ-join as well, as this can be achieved by selection from the result of a natural join (which degenerates to Cartesian product when there are no shared attributes).
Semijoin
The semijoin is joining similar to the natural join and written as R S where R and S are relations. The result of the semijoin is only the set of all tuples in R for which there is a tuple in S that is equal on their common attribute names. For an example consider the tables Employee and Dept and their semi join:
|
|
|
More formally the semantics of the semijoin is defined as follows:
- R S = { t : t
R, s
S, fun (t
s) }
where fun(r) is as in the definition of natural join.
The semijoin can be simulated using the natural join as follows. Assume that a1,...,an are the attribute names of R, then it holds that:
- R S =
a1,..,an(R
S)
Antijoin
The antijoin, written as R S where R and S are relations, is similar to the natural join, but the result of an antijoin is only those tuples in R for which there is NOT a tuple in S that is equal on their common attribute names.For an example consider the tables Employee and Dept and their antijoin:
|
|
|
The antijoin is formally defined as follows:
- R S = { t : t
R
s
S : fun (t
s) }
or
- R S = { t : t
R, there is no tuple s of S that satisfies fun (t
s) }
where fun(r) is as in the definition of natural join.
The antijoin can also be defined as the complement of the semijoin, as follows:
- R S = R - R S
Division
The division is a binary operation that is written as R ÷ S. The result consists of the restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their combinations with tuples in S are present in R. For an example see the tables CompletedTask, DBProject and their division:
|
|
|
If DBProject contains all the tasks of the Database project then the result of the division above contains exactly all the students that have completed the Database project.
More formally the semantics of the division is defined as follows:
- R ÷ S = { t[a1,...,an] : t
R
s
S ( (t[a1,...,an]
s)
R) }
where {a1,...,an} is the set of attribute names unique to R and t[a1,...,an] is the restriction of t to this set. It is usually required that the attribute names in the header of S are a subset of those of R because otherwise the result of the operation will always be empty.
The simulation of the division with the basic operations is as follows. We assume that a1,...,an are the attribute names unique to R and b1,...,bm are the attribute names of S. In the first step we project R on its unique attribute names and construct all combinations with tuples in S:
- T := πa1,...,an(R) × S
In the prior example, T would represent a table such that every Student (because Student is the unique key / attribute of the Completed table) is combined with every given Task. So Eugene, for instance, would have two rows, Eugene -> Database1 and Eugene -> Database2 in T.
In the next step we subtract R from this relation:
- U := T - R
- V := πa1,...,an(U)
- W := πa1,...,an(R) - V
Outer joins
Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuple in one of the operands by "fill" values for each of the attributes of the other operand.
The operators defined in this section assume the existence of a null value, ω, which we do not define, to be used for the fill values. It should not be assumed that this is the NULL defined for the database language SQL, nor should it be assumed that ω is a mark rather than a value, nor should it be assumed that the controversial three-valued logic is introduced by it.
Three outer join operators are defined: left outer join, right outer join, and full outer join. (The word "outer" is sometimes omitted.)
Left outer join
The left outer join is written as R =X S where R and S are relations. The result of the left outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition (loosely speaking) to tuples in R that have no matching tuples in S.For an example consider the tables Employee and Dept and their left outer join:
|
|
|
In the resulting relation, tuples in S which have no common values in common attribute names with tuples in R take a null value, ω.
Since there are no tuples in Dept with a DeptName of Finance or Executive, ωs occur in the resulting relation where tuples in DeptName have tuples of Finance or Executive.
R
(R
S) means T
(R
S)
where T = RxN
where N = "A relation with the sames attributes of S minus the attributes of R, with an only tuple with all its attributes set to NULL"
The left outer join can be simulated using the natural join and set union as follows:
- R =X S = R
(R
S)
Right outer join
The right outer join behaves almost identically to the left outer join, above, with the exception that all the values from the "other" relation appear in the resulting relation.The right outer join is written as R X= S where R and S are relations. The result of the right outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R.
For example consider the tables Employee and Dept and their right outer join:
|
|
|
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω.
Since there are no tuples in Employee with a DeptName of Production, ωs occur in the Name attribute of the resulting relation where tuples in DeptName had tuples of Production.
The right outer join can be simulated using the natural join and set union as follows:
- R X= S = S
(R
S)
Outer join
The outer join or full outer join in effect combines the results of the left and right outer joins.The full outer join is written as R =X= S where R and S are relations. The result of the full outer join is the set of all combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have no matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.
For an example consider the tables Employee and Dept and their full outer join:
|
|
|
In the resulting relation, tuples in R which have no common values in common attribute names with tuples in S take a null value, ω. Tuples in S which have no common values in common attribute names with tuples in R also take a null value, ω.
The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as follows:
- R=X=S = (R=XS)
(RX=S)
- R=X=S = R
S
(R
S)
Operations for domain computations
The aggregation operation
There are five aggregate functions that are included with most databases. These operations are Sum, Count, Average, Maximum and Minimum. In relational algebra, it is written as Exp1,Exp2,Exp3...Gfunc1,func2,func3...(Relation). While one must specify the function to use, the expressions, however, are optional. Let's assume that we have a table named Account with two columns, namely Branch_Name and Balance defined, and we wish to find the branch name with the highest balance, we would write Branch_NameGMax(Balance)(Account). To find the highest balance, we could simply write GMax(Balance)(Account).The extend operation
Limitation of relational algebra
Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on relations which cannot be expressed by relational algebra. The transitive closure of a binary relation is one of them. Given a domain D, let binary relation R be a subset of DxD. The transitive closure R+ of R is the smallest subset of DxD containing R which satifies the following condition:
x
y z ((x,y)
R+
(y,z)
R+
(x,z)
R+)
It can be proven that there is no relational algebra expression E(R) taking R as a variable argument which produces R+. The proof is based on the fact that, given a relational expression E for which it is claimed that E(R) = R+, where R is a variable, we can always find an instance r of R (and a corresponding domain d) such that E(r) ≠ r+.[1]
Use of algebraic properties for query optimization
Queries can be represented as a tree, where- the internal nodes are operators,
- leaves are relations,
- subtrees are subexpressions.
Here we present a set of rules, that can be used in such transformations.
Selection
Rules about selection operators play the most important role in query optimization. Selection is an operator that very effectively decreases the number of rows in its operand, so if we manage to move the selections in an expression tree towards the leaves, the internal relations (yielded by subexpressions) will likely shrink.Basic selection properties
Selection is idempotent (multiple applications of the same selection have no additional effect beyond the first one), and commutative (the order selections are applied in has no effect on the eventual result).Breaking up selections with complex conditions
A selection whose condition is a conjunction of simpler conditions is equivalent to a sequence of selections with those same individual conditions, and selection whose condition is a disjunction is equivalent to a union of selections. These identities can be used to merge selections so that fewer selections need to be evaluated, or to split them so that the component selections may be moved or optimized separately.Selection and cross product
Cross product is the costliest operator to evaluate. If the input relations have
and
rows, the result will contain
rows. Therefore it is very important to do our best to decrease the size of both operands before applying the cross product operator.
This can be effectively done, if the cross product is followed by a selection operator, e.g. (
×
). Considering the definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try to push down a selection from higher levels of the expression tree using the other selection rules.
In the above case we break up condition
into conditions
,
and
using the split rules about complex selection conditions, so that
=
and
only contains attributes from
,
contains attributes only from
and
contains the part of
that contains attributes from both
and
. Note, that
,
or
are possibly empty. Then the following holds:
Selection and set operators
Selection is distributive over the setminus, intersection, and union operators. The following three rules are used to push selection below set operations in the expression tree. Note, that in the setminus and the intersection operators it is possible to apply the selection operator to only one of the operands after the transformation. This can make sense in cases, where one of the operands is small, and the overhead of evaluating the selection operator outweighs the benefits of using a smaller relation as an operand.Selection and projection
Selection is associative with projection if and only if the fields referenced in the selection condition are a subset of the fields in the projection. Performing selection before projection may be useful if the operand is a cross product or join. In other cases, if the selection condition is relatively expensive to compute, moving selection outside the projection may reduce the number of tuples which must be tested (since projection may produce fewer tuples due to the elimination of duplicates resulting from elided fields).Projection
Basic projection properties
Projection is idempotent, so that a series of (valid) projections is equivalent to the outermost projection.Projection and set operators
Projection is distributive over set difference, union, and intersection.Rename
Basic rename properties
Successive renames of a variable can be collapsed into a single rename. Rename operations which have no variables in common can be arbitrarily reordered with respect to one another, which can be exploited to make successive renames adjacent so that they can be collapsed.Rename and set operators
Rename is distributive over set difference, union, and intersection.See also
- Cartesian product
- Database
- Logic of relatives
- Projection (mathematics)
- Projection (relational algebra)
- Projection (set theory)
- Relation
- Relation algebra
- Relational calculus
- Relation construction
- Relation composition
- Relation reduction
- Relational database
- Relational model
- Tacit extension
- Theory of relations
- Triadic relation
- Tutorial D
- D (data language specification)
- D4 (programming language) (an implementation of D)
Topics in database management systems (DBMS)
| |
|---|---|
|
Concepts Database Database models Database storage Relational model Distributed DBMS ACID Null Relational database Relational algebra Relational calculus Database normalization Referential integrity Relational DBMS Primary key, Foreign key, Surrogate key, Superkey, Candidate key | |
|
Objects Trigger View Table Cursor Log Transaction Index Stored procedure Partition |
Topics in SQL Select Insert Update Merge Delete Join Union Create Drop Begin work Commit Rollback Truncate Alter |
| Implementations of database management systems | |
|
Types of implementations Relational Flat file Deductive Dimensional Hierarchical Object oriented Object relational Temporal XML data stores | |
|
Database products Object-oriented (comparison) Relational (comparison) |
Components Query language Query optimizer Query plan ODBC JDBC |
References
1. ^ Aho, Alfred V.; Jeffrey D. Ullman (1979). "Universality of data retrieval languages". Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages: 110-119.
External links
- LEAP - An implementation of the relational algebra
- Query Optimization This paper is an excellent introduction into the use of the relational algebra in optimizing queries, and includes numerous citations for more in-depth study.
First-order logic (FOL) is a formal deductive system used by mathematicians, philosophers, linguists, and computer scientists. It goes by many names, including: first-order predicate calculus (FOPC), the lower predicate calculus,
..... Click the link for more information.
..... Click the link for more information.
relation or relationship is a generalization of 2-place relations, such as the relation of equality, denoted by the sign "=" in a statement like "5 + 7 = 12," or the relation of order,
..... Click the link for more information.
..... Click the link for more information.
In mathematics, a set is said to be closed under some operation if the operation on members of the set produces a member of the set. For example, the real numbers are closed under subtraction, but the natural numbers are not: 3 and 7 are both natural numbers, but the result of 3
..... Click the link for more information.
..... Click the link for more information.
operator is a function, that operates on (or modifies) another function. Often, an "operator" is a function that acts on functions to produce other functions (the sense in which Oliver Heaviside used the term); or it may be a generalization of such a function, as in linear algebra,
..... Click the link for more information.
..... Click the link for more information.
Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems.
..... Click the link for more information.
..... Click the link for more information.
relation algebra is a residuated Boolean algebra supporting an involutary unary operation called converse. The motivating example of a relation algebra is the algebra 2X² of all binary relations on a set X, with R•S
..... Click the link for more information.
..... Click the link for more information.
In universal algebra, a branch of pure mathematics, an algebraic structure consists of one or more sets closed under one or more operations, satisfying some axioms. Abstract algebra is primarily the study of algebraic structures and their properties.
..... Click the link for more information.
..... Click the link for more information.
Mathematical logic is a branch of mathematics, which grew out of symbolic logic. Subfields include model theory, proof theory, set theory, and recursion theory. Research in mathematical logic has contributed to, and been motivated by, the study of foundations of mathematics, but
..... Click the link for more information.
..... Click the link for more information.
Set theory is the mathematical theory of sets, which represent collections of abstract objects. It encompasses the everyday notions, introduced in primary school, often as Venn diagrams, of collections of objects, and the elements of, and membership in, such collections.
..... Click the link for more information.
..... Click the link for more information.
Edgar Frank "Ted" Codd
Born July 23 1923
Isle of Portland, England
Died March 18 2003 (aged 81)
..... Click the link for more information.
Born July 23 1923
Isle of Portland, England
Died March 18 2003 (aged 81)
..... Click the link for more information.
The relational model for database management is a database model based on predicate logic and set theory. It was first formulated and proposed in 1969 by Edgar Codd with aims that included avoiding, without loss of completeness, the need to write computer programs to
..... Click the link for more information.
..... Click the link for more information.
ISBL (Information Systems Base Language) is the relational algebra notation that was invented for PRTV, one of the earliest database management systems to implement E.F. Codd's relational model of data.
..... Click the link for more information.
External Links
- Sample ISBL usage
..... Click the link for more information.
Business System 12, or simply BS12, was one of the first fully relational database management systems, designed and implemented by IBM's Bureau Service subsidiary at the company's international development centre in Uithoorn, The Netherlands.
..... Click the link for more information.
..... Click the link for more information.
Hugh Darwen, employee of IBM UK from 1967 to 2004, has been involved in the history of the relational model since the beginning. From 1978 to 1982 he was a chief architect on Business System 12, a database management system that faithfully embraced the principles of the relational
..... Click the link for more information.
..... Click the link for more information.
Rel is an Open Source true relational database management system (TRDBMS) that implements a significant portion of Chris Date and Hugh Darwen's Tutorial D query language.
Primarily intended for teaching purposes, Rel is written in the Java programming language.
..... Click the link for more information.
Primarily intended for teaching purposes, Rel is written in the Java programming language.
..... Click the link for more information.
SQL
Paradigm: multi-paradigm
Appeared in: 1974
Designed by: Donald D. Chamberlin and Raymond F. Boyce
Developer: IBM
Latest release: SQL:2003/ 2003
Typing discipline: static, strong
Major implementations: Many
SQL
..... Click the link for more information.
Paradigm: multi-paradigm
Appeared in: 1974
Designed by: Donald D. Chamberlin and Raymond F. Boyce
Developer: IBM
Latest release: SQL:2003/ 2003
Typing discipline: static, strong
Major implementations: Many
SQL
..... Click the link for more information.
The extension of a predicate is the set of true propositions that can be formed by substituting a term for each of its free variables.
For example, consider the predicate "d2 is the weekday following d1".
..... Click the link for more information.
For example, consider the predicate "d2 is the weekday following d1".
..... Click the link for more information.
For the musical term, see .
In mathematics, a tuple is a finite sequence (also known as an "ordered list") of objects, each of a specified type. A tuple containing n objects is known as an "n-tuple".
..... Click the link for more information.
function expresses dependence between two quantities, one of which is given (the independent variable, argument of the function, or its "input") and the other produced (the dependent variable, value of the function, or "output").
..... Click the link for more information.
..... Click the link for more information.
First-order logic (FOL) is a formal deductive system used by mathematicians, philosophers, linguists, and computer scientists. It goes by many names, including: first-order predicate calculus (FOPC), the lower predicate calculus,
..... Click the link for more information.
..... Click the link for more information.
In relational algebra, a selection (sometimes called a restriction to avoid confusion with SQL's use of SELECT) is a unary operation written as or where:
..... Click the link for more information.
- and are attribute names
- is a binary operation in the set
- is a value constant
- is a relation
..... Click the link for more information.
projection is a unary operation written as where is a set of attribute names. The result of such projection is defined as the set that is obtained when all tuples in are restricted to the set .
..... Click the link for more information.
..... Click the link for more information.
In mathematics, the Cartesian product is a direct product of sets. The Cartesian product is named after René Descartes, whose formulation of analytic geometry gave rise to this concept.
..... Click the link for more information.
..... Click the link for more information.
In set theory and other branches of mathematics, the union of a collection of sets is the set that contains everything that belongs to any of the sets, but nothing else.
..... Click the link for more information.
Basic definition
If A and B are sets, then the union of A and B..... Click the link for more information.
In set theory and other branches of mathematics, two kinds of complements are defined, the relative complement and the absolute complement.
..... Click the link for more information.
Relative complement
If A and B are sets, then the relative complement of A in..... Click the link for more information.
In relational algebra, a rename is a unary operation written as where:
..... Click the link for more information.
- and are attribute names
- is a relation
..... Click the link for more information.
In mathematics, the intersection of two sets A and B is the set that contains all elements of A that also belong to B (or equivalently, all elements of B that also belong to A), but no other elements.
..... Click the link for more information.
..... Click the link for more information.
In computer science, domain relational calculus (DRC) is a calculus that was introduced by Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data model [1].
..... Click the link for more information.
..... Click the link for more information.
The tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational model in order to give a declarative database query language for this data model.
..... Click the link for more information.
..... Click the link for more information.
First-order logic (FOL) is a formal deductive system used by mathematicians, philosophers, linguists, and computer scientists. It goes by many names, including: first-order predicate calculus (FOPC), the lower predicate calculus,
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus