Information about Uuencode
Uuencoding is a form of binary-to-text encoding that originated in the Unix program uuencode, for encoding binary data for transmission over the uucp mail system. The name "uuencoding" is derived from "Unix-to-Unix encoding". Since uucp converted characters between various computer's character sets, uuencode was used to convert the data to fairly common characters that were unlikely to be "translated" and thereby destroy the file. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary files by e-mail and posting to usenet newsgroups, etc. It has now been largely replaced by MIME and yEnc.
Uuencoded data starts with a line of the form: begin <mode> <file> Where <mode> is the file's read/write/execute permissions as three octal digits, and <file> is the name to be used when recreating the binary data.
Uuencoding repeatedly takes in a group of three bytes, adding trailing zeros if there are fewer than three bytes left. These 24 bits are split into four groups of six which are treated as numbers between 0 and 63. Decimal 32 is added to each number and they are output as ASCII characters which will lie in the range 32 (space) to 32+63 = 95 (underscore). ASCII characters greater than 95 may also be used; however, only the six right-most bits are relevant.
Each group of sixty output characters (corresponding to 45 input bytes) is output as a separate line preceded by an encoded character giving the number of encoded bytes on that line. For all lines except the last, this will be the character 'M' (ASCII code 77 = 32+45). If the input is not evenly divisible by 45, the last line will contain the remaining N output characters, preceded by the character whose code is 32 + the number of remaining input bytes. Finally, a line containing just a single space (or grave character) is output, followed by one line containing the string "end".
Sometimes each data line has extra dummy characters (often the grave accent) added to avoid problems with mailers that strip trailing spaces. These characters are ignored by uudecode. The grave accent (ASCII 96) can also be used in place of a space character. When stripped of their high bits they both decode to 000000.
Despite using this limited range of characters, there are still some problems encountered when uuencoded data passes through certain old computers. The worst offenders are computers using non-ASCII character sets such as EBCDIC. To solve this problem, the Xxencode format was created as a more robust version of the encoding, which used only alphanumeric characters and the plus and minus symbols. This is also solved by the new Base64 format supported by uuencode.
The algorithm that is used for lines in between begin and end takes three octets as input and writes four characters of output by splitting the input at six-bit intervals into four octets, containing data in the lower six bits only. These octets are converted to characters by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then it shall be assumed to represent a printable character in the ISO/IEC 646:1991 standard encoded character set. It is then translated into the corresponding character codes for the codeset in use in the current locale. (For example, the octet 0x41, representing 'A' , would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC.)
Where the bits of two octets are combined, the least significant bits of the first octet are shifted left and combined with the most significant bits of the second octet shifted right. Thus the three octets A, B, C are converted into the four octets:
0x20 + (( A >> 2 ) & 0x3F) 0x20 + (((A << 4) | ((B >> 4) & 0xF)) & 0x3F) 0x20 + (((B << 2) | ((C >> 6) & 0x3)) & 0x3F) 0x20 + (( C ) & 0x3F)
These octets then are translated into the local character set.
Each encoded line contains a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by the encoded characters. The maximum number of octets to be encoded on each line shall be 45.
If the complete uuencoded output of the three ASCII characters Cat might appear as the following begin 644 cat.txt
Note: For Java, you need to use '>>>' (unsigned right-shift operator) instead of '>>'
..... Click the link for more information.
..... Click the link for more information.
The encoding process
Uuencode pre-dates base64 coding, but the POSIX version of uuencode has the option of producing and decoding data in base64 format, beginning with the line begin-base64 <mode> <file> and ending with the line The remainder of this article describes the original "historic" uuencode.Uuencoded data starts with a line of the form: begin <mode> <file> Where <mode> is the file's read/write/execute permissions as three octal digits, and <file> is the name to be used when recreating the binary data.
Uuencoding repeatedly takes in a group of three bytes, adding trailing zeros if there are fewer than three bytes left. These 24 bits are split into four groups of six which are treated as numbers between 0 and 63. Decimal 32 is added to each number and they are output as ASCII characters which will lie in the range 32 (space) to 32+63 = 95 (underscore). ASCII characters greater than 95 may also be used; however, only the six right-most bits are relevant.
Each group of sixty output characters (corresponding to 45 input bytes) is output as a separate line preceded by an encoded character giving the number of encoded bytes on that line. For all lines except the last, this will be the character 'M' (ASCII code 77 = 32+45). If the input is not evenly divisible by 45, the last line will contain the remaining N output characters, preceded by the character whose code is 32 + the number of remaining input bytes. Finally, a line containing just a single space (or grave character) is output, followed by one line containing the string "end".
Sometimes each data line has extra dummy characters (often the grave accent) added to avoid problems with mailers that strip trailing spaces. These characters are ignored by uudecode. The grave accent (ASCII 96) can also be used in place of a space character. When stripped of their high bits they both decode to 000000.
Despite using this limited range of characters, there are still some problems encountered when uuencoded data passes through certain old computers. The worst offenders are computers using non-ASCII character sets such as EBCDIC. To solve this problem, the Xxencode format was created as a more robust version of the encoding, which used only alphanumeric characters and the plus and minus symbols. This is also solved by the new Base64 format supported by uuencode.
uuencode historical algorithm
The standard output is a text file (encoded in the character set of the current locale) that begins with the line: begin <mode> <decode_pathname> and ends with the line: end where <mode> is the file's read/write/execute permissions as three octal digits, and <decode_pathname> is the name to be used when recreating the binary data. In both cases, the lines have no preceding or trailing <blank>s.The algorithm that is used for lines in between begin and end takes three octets as input and writes four characters of output by splitting the input at six-bit intervals into four octets, containing data in the lower six bits only. These octets are converted to characters by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then it shall be assumed to represent a printable character in the ISO/IEC 646:1991 standard encoded character set. It is then translated into the corresponding character codes for the codeset in use in the current locale. (For example, the octet 0x41, representing 'A' , would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC.)
Where the bits of two octets are combined, the least significant bits of the first octet are shifted left and combined with the most significant bits of the second octet shifted right. Thus the three octets A, B, C are converted into the four octets:
0x20 + (( A >> 2 ) & 0x3F) 0x20 + (((A << 4) | ((B >> 4) & 0xF)) & 0x3F) 0x20 + (((B << 2) | ((C >> 6) & 0x3)) & 0x3F) 0x20 + (( C ) & 0x3F)
These octets then are translated into the local character set.
Each encoded line contains a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by the encoded characters. The maximum number of octets to be encoded on each line shall be 45.
Sample uuencoding
The table shows the uuencoding of the three ASCII encoded charactersCat into its uuencoded representation 0V%T:
| Original characters | C |
a |
t
| |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Original ASCII, decimal | 67 | 97 | 116 | |||||||||||||||||||||
| ASCII, binary | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| New decimal values | 16 | 54 | 5 | 52 | ||||||||||||||||||||
| +32 | 48 | 86 | 37 | 84 | ||||||||||||||||||||
| Uuencoded characters | 0 |
V |
% |
T
| ||||||||||||||||||||
If the complete uuencoded output of the three ASCII characters Cat might appear as the following begin 644 cat.txt
- 0V%T
Uuencode table
The following table represents the subset of ASCII characters used by UUEncode and the 6-bit binary string they represent.| Printable Representation |
ASCII Decimal | Binary Representation |
Printable Representation |
ASCII Decimal | Binary Representation | |
|---|---|---|---|---|---|---|
| (space) | 32 | 000 000 | @ | 64 | 100 000 | |
| ! | 33 | 000 001 | A | 65 | 100 001 | |
| " | 34 | 000 010 | B | 66 | 100 010 | |
| # | 35 | 000 011 | C | 67 | 100 011 | |
| $ | 36 | 000 100 | D | 68 | 100 100 | |
| % | 37 | 000 101 | E | 69 | 100 101 | |
| & | 38 | 000 110 | F | 70 | 100 110 | |
| ' | 39 | 000 111 | G | 71 | 100 111 | |
| ( | 40 | 001 000 | H | 72 | 101 000 | |
| ) | 41 | 001 001 | I | 73 | 101 001 | |
| * | 42 | 001 010 | J | 74 | 101 010 | |
| + | 43 | 001 011 | K | 75 | 101 011 | |
| , | 44 | 001 100 | L | 76 | 101 100 | |
| - | 45 | 001 101 | M | 77 | 101 101 | |
| . | 46 | 001 110 | N | 78 | 101 110 | |
| / | 47 | 001 111 | O | 79 | 101 111 | |
| 0 | 48 | 010 000 | P | 80 | 110 000 | |
| 1 | 49 | 010 001 | Q | 81 | 110 001 | |
| 2 | 50 | 010 010 | R | 82 | 110 010 | |
| 3 | 51 | 010 011 | S | 83 | 110 011 | |
| 4 | 52 | 010 100 | T | 84 | 110 100 | |
| 5 | 53 | 010 101 | U | 85 | 110 101 | |
| 6 | 54 | 010 110 | V | 86 | 110 110 | |
| 7 | 55 | 010 111 | W | 87 | 110 111 | |
| 8 | 56 | 011 000 | X | 88 | 111 000 | |
| 9 | 57 | 011 001 | Y | 89 | 111 001 | |
| : | 58 | 011 010 | Z | 90 | 111 010 | |
| ; | 59 | 011 011 | [ | 91 | 111 011 | |
| < | 60 | 011 100 | align="center" | 92 | 111 100 | |
| = | 61 | 011 101 | ] | 93 | 111 101 | |
| > | 62 | 011 110 | ^ | 94 | 111 110 | |
| ? | 63 | 011 111 | _ | 95 | 111 111 | |
| ` | 96 | (1) 000 000 |
Decoding
You can decode the values by using the following schedule:>
Read 4 bytes: b0, b1, b2, b3
a = (((b0 - 0x20) & 0x3F) << 2 & 0xFC) | (((b1 - 0x20) & 0x3F) >> 4 & 0x03);
b = (((b1 - 0x20) & 0x3F) << 4 & 0xF0) | (((b2 - 0x20) & 0x3F) >> 2 & 0x0F);
c = (((b2 - 0x20) & 0x3F) << 6 & 0xC0) | ((b3 - 0x20) & 0x3F);
Write a,b,c
Note: For Java, you need to use '>>>' (unsigned right-shift operator) instead of '>>'
Trivia
Microsoft's E-mail-program Outlook Express once contained a flaw that it also accepts "begin ..." as start of UUEncoded attachments (i.e., not requiring octal encoded UNIX-style permissions). Especially in Usenet, where MIME is seldom used and plain text is preferred, some people would embed begin, space, space in their messages in order to maliciously hide the rest of the message from Outlook Express users (e.g., they configure their news-client to quote starting with the line "begin quote from xxx")[1].See also
References
This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.External links
- GNU sharutils - The Free Software Foundation's sharutils bundle includes uuencode, uudecode, and others.
- UUDeview - open-source program to encode/decode Base64, BinHex, uuencode, xxencode, etc. for Unix/Windows/DOS
- UUENCODE-UUDECODE - open-source program to encode/decode created by Clem "Grandad" Dye
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of ASCII-printable characters. These encodings are necessary for transmission of data when the channel or the protocol only allows ASCII-printable characters,
..... Click the link for more information.
..... Click the link for more information.
Unix (officially trademarked as UNIX®) is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs including Ken Thompson, Dennis Ritchie and Douglas McIlroy.
..... Click the link for more information.
..... Click the link for more information.
In communications, a code is a rule for converting a piece of information (for example, a letter, word, or phrase) into another form or representation, not necessarily of the same type.
..... Click the link for more information.
..... Click the link for more information.
binary numeral system, or base-2 number system, is a numeral system that represents numeric values using two symbols, usually 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2.
..... Click the link for more information.
..... Click the link for more information.
UUCP stands for Unix to Unix CoPy. The term generally refers to a suite of computer programs and protocols allowing remote execution of commands and transfer of files, email and netnews between computers.
..... Click the link for more information.
..... Click the link for more information.
A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes referred to as code page) with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in
..... Click the link for more information.
..... Click the link for more information.
E-mail (short for electronic mail; often also abbreviated as e-mail, email or simply mail) is a store and forward method of composing, sending, storing, and receiving messages over electronic communication systems.
..... Click the link for more information.
..... Click the link for more information.
Usenet (USEr NETwork) is a global, decentralized, distributed Internet discussion system that evolved from a general purpose UUCP architecture of the same name. It was conceived by Duke University graduate students Tom Truscott and Jim Ellis in 1979.
..... Click the link for more information.
..... Click the link for more information.
Mime or pantomime is a theatrical medium or performance art, involving the acting out of a story by a mime artist through body motions, without use of speech.
..... Click the link for more information.
History
..... Click the link for more information.
yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit Extended ASCII encoding method.
..... Click the link for more information.
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.
..... Click the link for more information.
POSIX (IPA: /ˈpɒsɪks/) or "Portable Operating System Interface" [1] is the collective name of a family of related standards specified by the IEEE to define the application programming interface (API) for
..... Click the link for more information.
..... Click the link for more information.
..... Click the link for more information.
Extended Binary Coded Decimal Interchange Code (EBCDIC) is an 8-bit character encoding (code page) used on IBM mainframe operating systems, like z/OS, OS/390, VM and VSE, as well as IBM minicomputer operating systems like OS/400 and i5/OS (see also Binary Coded Decimal).
..... Click the link for more information.
..... Click the link for more information.
Xxencode is an obsolete binary-to-text encoding similar to Uuencode which uses only the alphanumeric characters, and the plus and minus signs. It was invented as a means to transfer files in a format which would survive character set translation.
..... Click the link for more information.
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.
..... Click the link for more information.
..... Click the link for more information.
Outlook Express is an e-mail/news client that was included with several versions of Microsoft Windows, starting with Windows 98 through the release of Windows XP. Outlook Express was also bundled with Internet Explorer 4.0, and available for Windows 95 and the classic Mac OS 9.
..... Click the link for more information.
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.
..... Click the link for more information.
BinHex
File extension:
MIME type:
Uniform Type Identifier: com.apple.
..... Click the link for more information.
File extension:
.hqxMIME type:
application/mac-binhex40
application/mac-binhex
application/binhex
Uniform Type Identifier: com.apple.
..... Click the link for more information.
Mime or pantomime is a theatrical medium or performance art, involving the acting out of a story by a mime artist through body motions, without use of speech.
..... Click the link for more information.
History
..... Click the link for more information.
Xxencode is an obsolete binary-to-text encoding similar to Uuencode which uses only the alphanumeric characters, and the plus and minus signs. It was invented as a means to transfer files in a format which would survive character set translation.
..... Click the link for more information.
..... Click the link for more information.
yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit Extended ASCII encoding method.
..... Click the link for more information.
..... Click the link for more information.
GNU Free Documentation License (GNU FDL or simply GFDL) is a copyleft license for free documentation, designed by the Free Software Foundation (FSF) for the GNU project.
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus