Information about Binary To Text Encoding

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of ASCII-printable characters. These encodings are necessary for transmission of data when the channel or the protocol only allows ASCII-printable characters, such as e-mail or usenet. PGP documentation ( RFC 2440 ) uses the term ASCII armor for binary-to-text encoding when referring to Radix-64.

Description

The ASCII text-encoding standard uses 128 unique values (0–127) to represent the alphabetic, numeric, and punctuation characters commonly used in English, plus a selection of 'control codes' which do not represent printable characters. For example, the capital letter A is ASCII character 65, the numeral 2 is ASCII 50, the character } is ASCII 125, and the metacharactercarriage return is ASCII 13. Systems based on ASCII use seven bits to represent these values digitally.

By contrast, most computers store data in memory organized in eight-bit bytes, and, in the case of machine-executable code and non-textual data formats where maximum storage density is desirable, use the full range of 256 possible values in each eight-bit byte. Many computer programs came to rely on this distinction between seven-bit text and eight-bit binary data, and would not function properly if non-ASCII characters appeared in data that was expected to include only ASCII text. For example, if the value of the eighth bit is not preserved the program might interpret a byte value above 127 as a flag telling it to perform some function.

It is often desirable, however, to be able to send non-textual data through text-based systems, such as when one might attach an image file to an e-mail message. To accomplish this, the data are encoded in some way, such that eight-bit data are encoded into seven-bit ASCII characters (generally using only alphanumeric and punctuation characters). Upon safe arrival at its destination, it is then decoded back to its eight-bit form. This process is referred to as binary to text encoding. Many programs perform this conversion to allow for data-transport, such as PGP and GNU Privacy Guard (GPG).

Encoding plain text

Although this encoding method is useful for transmitting non-textual data through text-based systems, it is also used as a mechanism for encoding plain text. Some systems have a more limited character set they can handle -- not only are they not 8-bit clean, some can't even handle every printable ASCII character. Other systems make minor in-band signaling additions to the beginning or end of the text -- perhaps the most famous case was "The world wonders". By using a binary-to-text encoding on messages that are already plain text, then decoding on the other end, one can make such systems appear to be completely transparent. This is sometimes referred to as 'ASCII armoring'.

Examples:
  • the ViewState component of ASP.NET uses base64 encoding to safely transmit text via HTTP POST.

Encoding standards

The most used forms of binary-to-text encodings are: Some older and today uncommon formats include BOO, BTOA, and USR encoding. A newer, unstandardized encoding method is basE91, which produces the shortest plain ASCII output for compressed 8-bit binary input.

Most of these encodings generate text not containing all ASCII printable characters: for example, the base64 encoding generates text that only contains upper case and lower case letters, (A–Z, a–z), numerals (0–9), and the "+", "/", and "=" symbols.

Some of these encoding (quoted-printable and percent encoding) are based on a set of allowed characters and a single escape character. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text. These encodings produce the shortest plain ASCII output for input that is mostly printable ascii.

Some other encodings (base64, uuencoding) are based on mapping all possible sequences of six bits into different printable characters. Since there are more than 26 = 64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.

Some encodings (the original version of BinHex and the recommended encoding for CipherSaber) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard hexadecimal digits. Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding -- expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.
A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes referred to as code page) with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in
..... Click the link for more information.
plain text is textual material in a computer file which is unformatted and without very much processing readable by simple computer tools such as line printing text commands, in Windows'es DOS window type, and in Unix terminal window cat.
..... Click the link for more information.
binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text.
..... Click the link for more information.
American Standard Code for Information Interchange (ASCII), generally pronounced ask-ee IPA: /ˈæski/ ( [1] ), is a character encoding based on the English alphabet.
..... Click the link for more information.
E-mail (short for electronic mail; often also abbreviated as e-mail, email or simply mail) is a store and forward method of composing, sending, storing, and receiving messages over electronic communication systems.
..... Click the link for more information.
Usenet (USEr NETwork) is a global, decentralized, distributed Internet discussion system that evolved from a general purpose UUCP architecture of the same name. It was conceived by Duke University graduate students Tom Truscott and Jim Ellis in 1979.
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.
American Standard Code for Information Interchange (ASCII), generally pronounced ask-ee IPA: /ˈæski/ ( [1] ), is a character encoding based on the English alphabet.
..... Click the link for more information.
English}}} 
Writing system: Latin (English variant) 
Official status
Official language of: 53 countries
Regulated by: no official regulation
Language codes
ISO 639-1: en
ISO 639-2: eng
ISO 639-3: eng  
..... Click the link for more information.
A metacharacter is a character that has a special meaning (instead of a literal meaning) to a computer program, such as a shell interpreter or a regular expression engine.

Examples

  • In some Unix shells, the ; (semicolon) character is a statement separator.

..... Click the link for more information.
American Standard Code for Information Interchange (ASCII), generally pronounced ask-ee IPA: /ˈæski/ ( [1] ), is a character encoding based on the English alphabet.
..... Click the link for more information.
byte (pronounced /baɪt/) is a unit of measurement of information storage, most often consisting of eight bits. In many computer architectures it is a unit of memory addressing.
..... Click the link for more information.
American Standard Code for Information Interchange (ASCII), generally pronounced ask-ee IPA: /ˈæski/ ( [1] ), is a character encoding based on the English alphabet.
..... Click the link for more information.
GNU Privacy Guard (GnuPG or GPG) is a free software replacement for the PGP suite of cryptographic software, released under the GNU General Public License v3.
..... Click the link for more information.
plain text is textual material in a computer file which is unformatted and without very much processing readable by simple computer tools such as line printing text commands, in Windows'es DOS window type, and in Unix terminal window cat.
..... Click the link for more information.
Eight-bit clean describes a computer system that correctly handles 8-bit character sets, such as the ISO 8859 series and the UTF-8 encoding of Unicode. Up to the early 1990s, programs and communications systems assumed that all characters would be represented as numbers between 0
..... Click the link for more information.
in-band signaling is the sending of metadata and control information in the same band, on the same channel, as used for data.

For example, when dialing a modern telephone, the telephone number is encoded and transmitted across the telephone line as Dual-Tone Multi-Frequency
..... Click the link for more information.
"The world wonders" was security padding added by a radioman to a US Navy message from Admiral Chester Nimitz to Admiral William Halsey, Jr. on October 25, 1944 during the Battle of Leyte Gulf.
..... Click the link for more information.
transparency can refer to:
  1. The property of an entity that allows another entity to pass thorough it without altering either of the entities.
  2. The property that allows a transmission system or channel to accept, at its input, unmodified user information, and

..... Click the link for more information.
ASP.NET is a web application framework marketed by Microsoft that programmers can use to build dynamic web sites, web applications and XML web services. It is part of Microsoft's .NET platform and is the successor to Microsoft's Active Server Pages (ASP) technology.

ASP.
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.
hexadecimal, base-16, or simply hex, is a numeral system with a radix, or base, of 16, usually written using the symbols 0–9 and A–F, or a–f.
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.
Quoted-printable, or QP encoding, is an encoding using printable characters (i.e. alphanumeric and the equals sign "=") to transmit 8-bit data over a 7-bit data path. It is defined as a MIME content transfer encoding for use in Internet e-mail.
..... Click the link for more information.
Uuencoding is a form of binary-to-text encoding that originated in the Unix program uuencode, for encoding binary data for transmission over the uucp mail system. The name "uuencoding" is derived from "Unix-to-Unix encoding".
..... Click the link for more information.
yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit Extended ASCII encoding method.
..... Click the link for more information.
Ascii85 is a form of binary-to-text encoding developed by Adobe Systems. It is more efficient at encoding binary data as ASCII characters than Base64, resulting in only an approximately 25% increase in data size versus 33% for base64.
..... Click the link for more information.
BinHex

File extension: .hqx
MIME type: application/mac-binhex40
application/mac-binhex
application/binhex

Uniform Type Identifier: com.apple.
..... Click the link for more information.
Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. Although it is known as URL encoding
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter