Information about Text Normalization
Text normalization is a process by which text is transformed in some way to make it consistent in a way which it may not have been before. Text normalization is often performed before a text is processed in some way, such as generating synthesized speech, automated language translation, storage in a database, or comparison.
Examples of text normalization:
The text normalization is useful, for example, for comparing two sequence of characters which mean the same but are represented differently. The examples of this kind of normalization include, but not limited to, "don't" vs "do not", "I'm" vs "I am", "Can't" vs "Cannot".
Further, "1" and "one" are same, "1st" is same as "first", and so on. Instead of treating these strings as different, through text processing, one can treat them as same.
Examples of text normalization:
- Unicode normalization
- converting all letters to lower or upper case
- removing punctuation
- removing letters with accent marks and other diacritics
- expanding abbreviations
The text normalization is useful, for example, for comparing two sequence of characters which mean the same but are represented differently. The examples of this kind of normalization include, but not limited to, "don't" vs "do not", "I'm" vs "I am", "Can't" vs "Cannot".
Further, "1" and "one" are same, "1st" is same as "first", and so on. Instead of treating these strings as different, through text processing, one can treat them as same.
Writing, is the representation of language in a textual medium; that is with the use of signs or symbols. It is distinguished from illustration such as cave drawings and paintings, and recording language via a non-textual medium such as magnetic tape audio.
..... Click the link for more information.
..... Click the link for more information.
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware.
..... Click the link for more information.
..... Click the link for more information.
Machine translation, sometimes referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.
..... Click the link for more information.
..... Click the link for more information.
database is a structured collection of records or data that is stored in a computer system so that a computer program or person using a query language can consult it to answer queries. The records retrieved in answer to queries are information that can be used to make decisions.
..... Click the link for more information.
..... Click the link for more information.
Unicode normalization is a form of text normalization that transforms equivalent characters or sequences of characters into a consistent underlying representation so that they may be easily compared.
..... Click the link for more information.
..... Click the link for more information.
A programming language is an artificial language that can be used to control the behavior of a machine, particularly a computer. Programming languages, like natural languagess, are defined by syntactic and semantic rules which describe their structure and meaning respectively.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus