Information about Multiple Sequence Alignment
First 90 positions of a protein multiple sequence alignment of instances of the acidic ribosomal protein P0 (L10E) from several organisms. Generated with ClustalW.
A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In general, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations (single amino acid or nucleotide changes) that appear as differing characters in a single alignment column, and insertion or deletion mutations (or indels) that appear as gaps in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acids or nucleotides.
Multiple sequence alignment also refers to the process of aligning such a sequence set. Because three or more sequences of biologically relevant length are nearly impossible to align by hand, computational algorithms are used to produce and analyze the alignments. MSAs require more sophisticated methodologies than pairwise alignment because they are more computationally complex to produce. Most multiple sequence alignment programs use heuristic methods rather than global optimization because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive.
Dynamic programming and computational complexity
The most direct method for producing an MSA uses the dynamic programming technique to identify the globally optimal alignment solution. For proteins, this method usually involves two sets of parameters: a gap penalty and a substitution matrix assigning scores or probabilities to the alignment of each possible pair of amino acids based on the similarity of the amino acids' chemical properties and the evolutionary probability of the mutation. For nucleotide sequences a substitution matrix can be used, but since there are only four possible standard characters per sequence and the individual nucleotides do not typically differ much in substitution probability, the parameters for DNA and RNA sequences usually consist of a gap penalty, a positive score for character matches, and a negative score for mismatches.For n individual sequences, the method requires constructing the n-dimensional equivalent of the matrix formed in standard pairwise dynamic programming. The search space thus increases exponentially with increasing n and is also strongly dependent on sequence length. To find the global optimum for n sequences this way has been shown to be an NP-complete problem.[1][2] Methods to reduce the search space by first performing pairwise dynamic programming on each pair of sequences in the query set and searching only the solution space near these results (effectively finding the intersection between local paths immediately surrounding each pairwise optimum solution) render the dynamic programming technique more efficient. The so-called "sum of pairs" method has been implemented in the software package MSA, but it is still impractical for many MSA applications that require the simultaneous alignment of dozens or even a few hundred sequences. Dynamic programming methods are now used only when an extremely high-quality alignment of a small number of sequences is needed, and as a benchmarking standard in evaluating new or refined heuristic techniques.
Progressive alignment construction
One method of performing a heuristic alignment search is the progressive technique (also known as the hierarchical or tree method) that builds up a final MSA by first performing a series of pairwise alignments on successively less closely related sequences. Such methods begin by aligning the two most closely related sequences first and then successively aligning the next most closely related sequence in the query set to the alignment produced in the previous step. The initial "most related" pair is determined by an efficient clustering method such as neighbor-joining based on a simple heuristic search of the query set with a tool like FASTA. Progressive techniques therefore automatically construct a phylogenetic tree as well as an alignment.One major limitation of progressive methods is their heavy dependence on the initial assignment of relatedness and on the quality of the initial alignment. The methods are thus sensitive as well to the distribution of sequences in the query set; performance improves when relatedness among query sequences is a relatively smooth gradient rather than distantly separated clusters. Performance also degrades significantly when all of the sequences in the set are rather distantly related, because inaccuracies in the initial alignment are then more likely. Most modern progressive methods modify their scoring function with a secondary weighting function that assigns scaling factors to individual members of the query set in a nonlinear fashion based on their phylogenetic distance from their nearest neighbors. Judicious choice of weighting can aid in evaluating relatedness and mitigate the effects of relatively poor initial alignments early in the progression.
Progressive alignment methods are efficient enough to implement on a large scale for many sequences and are often run on publicly accessible web servers so users need not locally install the applications of interest. A very popular progressive alignment method is the Clustal family,[3] especially the weighted variant ClustalW[4] to which access is provided by a large number of web portals including GenomeNet, EBI, and EMBNet. Different portals or implementations can vary in user interface and make different parameters accessible to the user. Clustal is used extensively for phylogenetic tree construction and as input for protein structure prediction by homology modeling.
Another common progressive alignment method called T-Coffee[5] is slower than Clustal and its derivatives but generally produces more accurate alignments for distantly related sequence sets. T-coffee calculates pairwise alignments by combining the direct alignment of the pair with indirect alignments that aligns each sequence of the pair to a third sequence. It uses the output from Clustal as well as another local alignment program LALIGN, which finds multiple regions of local alignment between two sequences. The resulting alignment and phylogenetic tree are used as a guide to produce new and more accurate weighting factors.
Because progressive methods are heuristics that are not guaranteed to converge to a global optimum, alignment quality can be difficult to evaluate and their true biological significance can be obscure. A very recent semi-progressive method that improves alignment quality and does not use a lossy heuristic while still running in polynomial time[6] has been implemented in the program PSAlign.
Iterative methods
A set of methods to produce MSAs while reducing the errors inherent in progressive methods are classified as "iterative" because they work similarly to progressive methods but repeatedly realign the initial sequences as well as adding new sequences to the growing MSA. One reason progressive methods are so strongly dependent on a high-quality initial alignment is the fact that these alignments are always incorporated into the final result - that is, once a sequence has been aligned into the MSA, its alignment is not considered further. This approximation improves efficiency at the cost of accuracy. By contrast, iterative methods can return to previously calculated pairwise alignments or sub-MSAs incorporating subsets of the query sequence as a means of optimizing a general objective function such as finding a high-quality alignment score.A variety of subtly different iteration methods have been implemented and made available in software packages; reviews and comparisons have been useful but generally refrain from choosing a "best" technique.[7] The software package PRRN/PRRP uses a hill-climbing algorithm to optimize its MSA alignment score[8] and iteratively corrects both alignment weights and locally divergent or "gappy" regions of the growing MSA.[9] PRRP performs best when refining an alignment previously constructed by a faster method.
Another iterative program, DIALIGN, takes an unusual approach of focusing narrowly on local alignments between sub-segments or sequence motifs without introducing a gap penalty.[11] The alignment of individual motifs is then achieved with a matrix representation similar to a dot-matrix plot in a pairwise alignment. An alternative method that uses fast local alignments as anchor points or "seeds" for a slower global-alignment procedure is implemented in the CHAOS/DIALIGN suite.[11]
A third popular iteration-based method called MUSCLE (multiple sequence alignment by log-expectation) improves on progressive methods with a more accurate distance measure to assess the relatedness of two sequences.[13] The distance measure is updated between iteration stages (although, in its original form, MUSCLE contained only 2-3 iterations depending on whether refinement was enabled).
Hidden Markov models
Hidden Markov models are probabilistic models that can assign likelihoods to all possible combinations of gaps, matches, and mismatches to determine the most likely MSA or set of possible MSAs. HMMs can produce a single highest-scoring output but can also generate a family of possible alignments that can then be evaluated for biological significance. Because HMMs are probabilistic, they do not produce the same solution every time they are run on the same dataset; thus they cannot be guaranteed to converge to an optimal alignment. HMMs can produce both global and local alignments. Although HMM-based methods have been developed relatively recently, they offer significant improvements in computational speed, especially for sequences that contain overlapping regions.<ref name="mount" />Typical HMM-based methods work by representing an MSA as a form of directed acyclic graph known as a partial-order graph, which consists of a series of nodes representing possible entries in the columns of an MSA. In this representation a column that is absolutely conserved (that is, that all the sequences in the MSA share a particular character at a particular position) is coded as a single node with as many outgoing connections as there are possible characters in the next column of the alignment. In the terms of a typical hidden Markov model, the observed states are the individual alignment columns and the "hidden" states represent the presumed ancestral sequence from which the sequences in the query set are hypothesized to have descended. A efficient search variant of the dynamic programming method, known as the Viterbi algorithm, is generally used to successively align the growing MSA to the next sequence in the query set to produce a new MSA.[14] This is distinct from progressive alignment methods because the alignment of prior sequences is updated at each new sequence addition. However, like progressive methods, this technique can be influenced by the order in which the sequences in the query set are integrated into the alignment, especially when the sequences are distantly related.<ref name="mount" />
Several software programs are available in which variants of HMM-based methods have been implemented and which are noted for their scalability and efficiency, although properly using an HMM method is more complex than using more common progressive methods. The simplest is POA (Partial-Order Alignment)[15]; a similar but more generalized method is implemented in the package SAM (Sequence Alignment and Modeling System).[14] SAM has been used as a source of alignments for protein structure prediction to participate in the CASP structure prediction experiment and to develop a database of predicted proteins in the yeast species S. cerevisiae. HMM methods can also be used for database search with HMMer.[17]
Genetic algorithms and simulated annealing
Standard optimization techniques in computer science - both of which were inspired by, but do not directly reproduce, physical processes - have also been used in an attempt to more efficiently produce quality MSAs. On such technique, genetic algorithms, have been used for MSA production in an attempt to broadly simulate the hypothesized evolutionary process that gave rise to the divergence in the query set. The method works by breaking a series of possible MSAs into fragments and repeatedly rearranging those fragments with the introduction of gaps at varying positions. A general objective function is optimized during the simulation, most generally the "sum of pairs" maximization function introduced in dynamic programming-based MSA methods. A technique for protein sequences has been implemented in the software program SAGA (Sequence Alignment by Genetic Algorithm)[18] and its equivalent in RNA is called RAGA.[19]The technique of simulated annealing, by which an existing MSA produced by another method is refined by a series of rearrangements designed to find more optimal regions of alignment space than the one the input alignment already occupies. Like the genetic algorithm method, simulated annealing maximizes an objective function like the sum-of-pairs function. Simulated annealing uses a metaphorical "temperature factor" that determines the rate at which rearrangements proceed and the likelihood of each rearrangement; typical usage alternates periods of high rearrangement rates with relatively low likelihood (to explore more distant regions of alignment space) with periods of lower rates and higher likelihoods to more thoroughly explore local minima near the newly "colonized" regions. This approach has been implemented in the program MSASA (Multiple Sequence Alignment by Simulated Annealing).[20]
Motif finding
Alignment of the seven Drosophila caspases colored by motifs as identified by MEME. When motif positions and sequence alignments are generated independently, they often correlate well but not perfectly, as in this example.
Blocks analysis is a method of motif finding that restricts motifs to ungapped regions in the alignment. Blocks can be generated from an MSA or they can be extracted from unaligned sequences using a precalculated set of common motifs previously generated from known gene families.[21] Block scoring generally relies on the spacing of high-frequency characters rather than on the calculation of an explicit substitution matrix. The BLOCKS server provides an interactive method to locate such motifs in unaligned sequences.
Statistical pattern-matching has been implemented using both the expectation-maximization algorithm and the Gibbs sampler. One of the most common motif-finding tools, known as MEME, uses expectation maximization and hidden Markov methods to generate motifs that are then used as search tools by its companion MAST in the combined suite MEME/MAST.[22][23]
See also
External links
- ExPASy sequence alignment tools
- Multiple Alignment Resource Page from the Virtual School of Natural Sciences
- Tools for Multiple Alignments from Pôle Bioinformatique Lyonnais
- Multiple sequence alignment lectures from the Max Planck Institute for Molecular Genetics
- An entry point to the main T-Coffee servers
References
1. ^ Wang L, Jiang T. (1994) On the complexity of multiple sequence alignment. J Comput Biol 1:337-348.
2. ^ Just W. (2001). Computational complexity of multiple sequence alignment with SP-score. J Comput Biol 8(6):615-23.
3. ^ Higgins DG, Sharp PM. (1988). CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73(1):237-44.
4. ^ Thompson JD, Higgins DG, Gibson TJ. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673-4680.
5. ^ Notredame C, Higgins DG, Heringa J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205-17.
6. ^ Sze SH, Lu Y, Yang Q. (2006). A polynomial time solvable formulation of multiple sequence alignment. J Comput Biol 13(2):309-19.
7. ^ Hirosawa M, Totoki Y, Hoshida M, Ishikawa M. (1995). Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11:13-18.
8. ^ Gotoh O. (1996). Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264(4):823-38.
9. ^ Mount DM. (2004). Bioinformatics: Sequence and Genome Analysis 2nd ed. Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY.
11. ^ Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B. (2003) Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4:66.
12. ^ Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B. (2003) Fast and sensitive multiple alignment of large genomic sequences BMC Bioinformatics 4:66.
13. ^ Edgar RC. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792-97.
14. ^ Hughey R, Krogh A. (1996). Hidden Markov models for sequence analysis: extension and analysis of the basic method. CABIOS 12(2):95-107.
15. ^ Grasso C, Lee C. (2004). Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics 20(10):1546-56.
16. ^ Hughey R, Krogh A. SAM: Sequence alignment and modeling software system. Technical Report UCSC-CRL-96-22, University of California, Santa Cruz, CA, September 1996.
17. ^ Durbin R, Eddy S, Krogh A, Mitchison G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press, 1998.
18. ^ Notredame C, Higgins DG. (1996). SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res 24(8):1515-24.
19. ^ Notredame C, O'Brien EA, Higgins DG. (1997). RAGA: RNA sequence alignment by genetic algorithm. Nucleic Acids Res 25(22):4570-80.
20. ^ Kim J, Pramanik S, Chung MJ. (1994). Multiple sequence alignment using simulated annealing. Comput Appl Biosci 10(4):419-26.
21. ^ Henikoff S, Henikoff JG. (1991). Automated assembly of protein blocks for database searching. Nucleic Acids Res 19:6565-72.
22. ^ Bailey TL, Elkan C.(1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California.
23. ^ Bailey TL, Gribskov M. (1998). Combining evidence using p-values: application to sequence homology searches. Bioinformatics14:48-54.
2. ^ Just W. (2001). Computational complexity of multiple sequence alignment with SP-score. J Comput Biol 8(6):615-23.
3. ^ Higgins DG, Sharp PM. (1988). CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73(1):237-44.
4. ^ Thompson JD, Higgins DG, Gibson TJ. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673-4680.
5. ^ Notredame C, Higgins DG, Heringa J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1):205-17.
6. ^ Sze SH, Lu Y, Yang Q. (2006). A polynomial time solvable formulation of multiple sequence alignment. J Comput Biol 13(2):309-19.
7. ^ Hirosawa M, Totoki Y, Hoshida M, Ishikawa M. (1995). Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11:13-18.
8. ^ Gotoh O. (1996). Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol 264(4):823-38.
9. ^ Mount DM. (2004). Bioinformatics: Sequence and Genome Analysis 2nd ed. Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY.
11. ^ Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B. (2003) Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4:66.
12. ^ Brudno M, Chapman M, Göttgens B, Batzoglou S, Morgenstern B. (2003) Fast and sensitive multiple alignment of large genomic sequences BMC Bioinformatics 4:66.
13. ^ Edgar RC. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792-97.
14. ^ Hughey R, Krogh A. (1996). Hidden Markov models for sequence analysis: extension and analysis of the basic method. CABIOS 12(2):95-107.
15. ^ Grasso C, Lee C. (2004). Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics 20(10):1546-56.
16. ^ Hughey R, Krogh A. SAM: Sequence alignment and modeling software system. Technical Report UCSC-CRL-96-22, University of California, Santa Cruz, CA, September 1996.
17. ^ Durbin R, Eddy S, Krogh A, Mitchison G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press, 1998.
18. ^ Notredame C, Higgins DG. (1996). SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res 24(8):1515-24.
19. ^ Notredame C, O'Brien EA, Higgins DG. (1997). RAGA: RNA sequence alignment by genetic algorithm. Nucleic Acids Res 25(22):4570-80.
20. ^ Kim J, Pramanik S, Chung MJ. (1994). Multiple sequence alignment using simulated annealing. Comput Appl Biosci 10(4):419-26.
21. ^ Henikoff S, Henikoff JG. (1991). Automated assembly of protein blocks for database searching. Nucleic Acids Res 19:6565-72.
22. ^ Bailey TL, Elkan C.(1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California.
23. ^ Bailey TL, Gribskov M. (1998). Combining evidence using p-values: application to sequence homology searches. Bioinformatics14:48-54.
Survey articles
- Duret, L.; S. Abdeddaim (2000). "Multiple alignment for structural functional or phylogenetic analyses of homologous sequences", in D. Higgins and W. Taylor: Bioinformatics sequence structure and databanks. Oxford: Oxford University Press.
- Notredame, C. (2002). "Recent progresses in multiple sequence alignment: a survey". Pharmacogenomics 31 (1): 131 -- 144.
- Thompson, J. D.; F. Plewniak and O. Poch (1999). "A comprehensive comparison of multiple sequence alignment programs". Nucleic Acids Research 27 (13): 12682--2690.
- Wallace, I.M.; Blackshields G and Higgins DG. (2005). "Multiple sequence alignments". Curr Opin Struct Biol 15 (3): 261-6..
- Notredame, C (2007). "Recent evolutions of multiple sequence alignment algorithms". PLOS Computational Biology 8 (3): e123..
In bioinformatics, a sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
..... Click the link for more information.
..... Click the link for more information.
primary structure of a biological molecule is the exact specification of its atomic composition and the chemical bonds connecting those atoms (including stereochemistry). For a typical unbranched, un-crosslinked biopolymer (such as a molecule of DNA, RNA or typical intracellular
..... Click the link for more information.
..... Click the link for more information.
Proteins are large organic compounds made of amino acids arranged in a linear chain and joined together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues.
..... Click the link for more information.
..... Click the link for more information.
Editing of this page by unregistered or newly registered users is currently disabled due to vandalism.
If you are prevented from editing this page, and you wish to make a change, please discuss changes on the talk page, request unprotection, log in, or .
..... Click the link for more information.
If you are prevented from editing this page, and you wish to make a change, please discuss changes on the talk page, request unprotection, log in, or .
..... Click the link for more information.
Left: An RNA strand, with its nitrogenous bases. Right: Double-stranded DNA.]] Ribonucleic acid or RNA is a nucleic acid polymer consisting of nucleotide monomers, which plays several important roles in the processes of translating genetic information from
..... Click the link for more information.
..... Click the link for more information.
Editing of this page by unregistered or newly registered users is currently disabled due to vandalism.
If you are prevented from editing this page, and you wish to make a change, please discuss changes on the talk page, request unprotection, log in, or .
..... Click the link for more information.
If you are prevented from editing this page, and you wish to make a change, please discuss changes on the talk page, request unprotection, log in, or .
..... Click the link for more information.
In evolutionary biology, homology is any similarity between characters that is due to their shared ancestry. There are examples in different branches of biology. Anatomical structures that perform the same function in different biological species and evolved from the same structure
..... Click the link for more information.
..... Click the link for more information.
Molecular phylogeny is the use of the structure of molecules to gain information on an organism's evolutionary relationships. The result of a molecular phylogenetic analysis is expressed in a so-called phylogenetic tree.
Every living organism contains DNA, RNA, and proteins.
..... Click the link for more information.
Every living organism contains DNA, RNA, and proteins.
..... Click the link for more information.
mutations are changes to the base pair sequence of the genetic material of an organism. Mutations can be caused by copying errors in the genetic material during cell division, by exposure to ultraviolet or ionizing radiation, chemical mutagens, or viruses, or can occur deliberately
..... Click the link for more information.
..... Click the link for more information.
amino acid is a molecule that contains both amine and carboxyl functional groups. In biochemistry, this term refers to alpha-amino acids with the general formula H2NCHRCOOH, where R is an organic substituent.
..... Click the link for more information.
..... Click the link for more information.
A nucleotide is a chemical compound that consists of 3 portions: a heterocyclic base, a sugar, and one or more phosphate groups. In the most common nucleotides the base is a derivative of purine or pyrimidine, and the sugar is the pentose (five-carbon sugar) deoxyribose or ribose.
..... Click the link for more information.
..... Click the link for more information.
The word indel is a portmanteau of insertion or deletion, referring to the two types of genetic mutation that are often considered together because of their similar effect and the inability to distinguish between them in a comparison of two sequences.
..... Click the link for more information.
..... Click the link for more information.
Conservation refers to a high degree of similarity in orthologous DNA sequences, protein sequences, or protein structures amongst various phyla. A highly conserved protein is often related to an important cellular function.
..... Click the link for more information.
..... Click the link for more information.
citation, footnoting or external linking.
A structural domain is an element of overall structure within a protein that is self-stabilizing and often folds independently of the rest of the protein chain.
..... Click the link for more information.
In biochemistry and chemistry, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.[1]
..... Click the link for more information.
Relationship to primary sequence
..... Click the link for more information.
secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids (DNA/RNA). It does not, however, describe specific atomic positions in three-dimensional space, which are considered to be tertiary structure.
..... Click the link for more information.
..... Click the link for more information.
In mathematics, computing, linguistics, and related disciplines, an algorithm is a finite list of well-defined instructions for accomplishing some task that, given an initial state, will proceed through a well-defined series of successive states, eventually terminating in an
..... Click the link for more information.
..... Click the link for more information.
In bioinformatics, a sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
..... Click the link for more information.
..... Click the link for more information.
For heuristics in computer science, see .
A heuristic is a method for helping in solving of a problem, commonly informal. It is particularly used for a method that often rapidly leads to a solution that is usually reasonably close to the best
..... Click the link for more information.
Global optimization is a branch of applied mathematics and numerical analysis that deals with the optimization of a function or a set of functions to some criteria.
..... Click the link for more information.
General
The most common form is the minimization of one real-valued function in the parameter-space ...... Click the link for more information.
dynamic programming is a method of solving problems exhibiting the properties of overlapping subproblems and optimal substructure (described below) that takes much less time than naive methods.
..... Click the link for more information.
..... Click the link for more information.
Gap penalties are used during sequence alignment. Gap penalties contribute to the overall score of alignments, and therefore, the size of the gap penalty relative to the entries in the similarity matrix affects the alignment that is finally selected.
..... Click the link for more information.
..... Click the link for more information.
substitution matrix describes the rate at which one character in a sequence changes to other character states over time. Substitution matrices are usually seen in the context of amino acid or DNA sequence alignments, where the similarity between sequences depends on their
..... Click the link for more information.
..... Click the link for more information.
In complexity theory, the NP-complete problems are the most difficult problems in NP ("non-deterministic polynomial time") in the sense that they are the smallest subclass of NP that could conceivably remain outside of P, the class of deterministic polynomial-time problems.
..... Click the link for more information.
..... Click the link for more information.
Benchmark may refer to:
..... Click the link for more information.
- Benchmark (surveying), a point of reference for a measurement
- Benchmark (crude oil), a reference for and discussion of cost and/or pricing of petroleum, such as Brent Crude and West Texas Intermediate in terms of benchmarks based on
..... Click the link for more information.
Clustering can refer to
..... Click the link for more information.
- Computer clustering - the connection of many low-cost computers to be used as one larger computer.
- In computer science, the undesirable, contiguous grouping of elements in a hash table.
..... Click the link for more information.
In bioinformatics, neighbor-joining is a bottom-up clustering method used for the creation of phylogenetic trees. Usually used for trees based on DNA or protein sequence data, the algorithm requires knowledge of the distance between each pair of taxa (e.g.
..... Click the link for more information.
..... Click the link for more information.
FASTA is a DNA and Protein sequence alignment software package first described (as FASTP) by David J. Lipman and William R. Pearson in 1985 in the article Rapid and sensitive protein similarity searches .
..... Click the link for more information.
..... Click the link for more information.
A phylogenetic tree, also called an evolutionary tree, is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor.
..... Click the link for more information.
..... Click the link for more information.
Clustal is a widely used multiple sequence alignment computer program. The latest version is 1.83. There are two main variations:
..... Click the link for more information.
- ClustalW: command line interface
- ClustalX: This version has a graphical user interface.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus