Information about Uuencode

Uuencoding is a form of binary-to-text encoding that originated in the Unix program uuencode, for encoding binary data for transmission over the uucp mail system. The name "uuencoding" is derived from "Unix-to-Unix encoding". Since uucp converted characters between various computer's character sets, uuencode was used to convert the data to fairly common characters that were unlikely to be "translated" and thereby destroy the file. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary files by e-mail and posting to usenet newsgroups, etc. It has now been largely replaced by MIME and yEnc.

The encoding process

Uuencode pre-dates base64 coding, but the POSIX version of uuencode has the option of producing and decoding data in base64 format, beginning with the line begin-base64 <mode> <file> and ending with the line

The remainder of this article describes the original "historic" uuencode.

Uuencoded data starts with a line of the form: begin <mode> <file> Where <mode> is the file's read/write/execute permissions as three octal digits, and <file> is the name to be used when recreating the binary data.

Uuencoding repeatedly takes in a group of three bytes, adding trailing zeros if there are fewer than three bytes left. These 24 bits are split into four groups of six which are treated as numbers between 0 and 63. Decimal 32 is added to each number and they are output as ASCII characters which will lie in the range 32 (space) to 32+63 = 95 (underscore). ASCII characters greater than 95 may also be used; however, only the six right-most bits are relevant.

Each group of sixty output characters (corresponding to 45 input bytes) is output as a separate line preceded by an encoded character giving the number of encoded bytes on that line. For all lines except the last, this will be the character 'M' (ASCII code 77 = 32+45). If the input is not evenly divisible by 45, the last line will contain the remaining N output characters, preceded by the character whose code is 32 + the number of remaining input bytes. Finally, a line containing just a single space (or grave character) is output, followed by one line containing the string "end".

Sometimes each data line has extra dummy characters (often the grave accent) added to avoid problems with mailers that strip trailing spaces. These characters are ignored by uudecode. The grave accent (ASCII 96) can also be used in place of a space character. When stripped of their high bits they both decode to 000000.

Despite using this limited range of characters, there are still some problems encountered when uuencoded data passes through certain old computers. The worst offenders are computers using non-ASCII character sets such as EBCDIC. To solve this problem, the Xxencode format was created as a more robust version of the encoding, which used only alphanumeric characters and the plus and minus symbols. This is also solved by the new Base64 format supported by uuencode.

uuencode historical algorithm

The standard output is a text file (encoded in the character set of the current locale) that begins with the line: begin <mode> <decode_pathname> and ends with the line: end where <mode> is the file's read/write/execute permissions as three octal digits, and <decode_pathname> is the name to be used when recreating the binary data. In both cases, the lines have no preceding or trailing <blank>s.

The algorithm that is used for lines in between begin and end takes three octets as input and writes four characters of output by splitting the input at six-bit intervals into four octets, containing data in the lower six bits only. These octets are converted to characters by adding a value of 0x20 to each octet, so that each octet is in the range [0x20,0x5f], and then it shall be assumed to represent a printable character in the ISO/IEC 646:1991 standard encoded character set. It is then translated into the corresponding character codes for the codeset in use in the current locale. (For example, the octet 0x41, representing 'A' , would be translated to 'A' in the current codeset, such as 0xc1 if it were EBCDIC.)

Where the bits of two octets are combined, the least significant bits of the first octet are shifted left and combined with the most significant bits of the second octet shifted right. Thus the three octets A, B, C are converted into the four octets:

0x20 + (( A >> 2 ) & 0x3F) 0x20 + (((A << 4) | ((B >> 4) & 0xF)) & 0x3F) 0x20 + (((B << 2) | ((C >> 6) & 0x3)) & 0x3F) 0x20 + (( C ) & 0x3F)

These octets then are translated into the local character set.

Each encoded line contains a length character, equal to the number of characters to be decoded plus 0x20 translated to the local character set as described above, followed by the encoded characters. The maximum number of octets to be encoded on each line shall be 45.

Sample uuencoding

The table shows the uuencoding of the three ASCII encoded characters Cat into its uuencoded representation 0V%T:

Original charactersC a t
Original ASCII, decimal67 97 116
ASCII, binary010000110110000101110100
New decimal values16 54 5 52
+3248 86 37 84
Uuencoded characters0 V % T


If the complete uuencoded output of the three ASCII characters Cat might appear as the following begin 644 cat.txt
  1. 0V%T
` end

Uuencode table

The following table represents the subset of ASCII characters used by UUEncode and the 6-bit binary string they represent.

Printable
Representation
ASCII Decimal Binary
Representation
Printable
Representation
ASCII Decimal Binary
Representation
(space)32000 000 @64100 000
!33000 001 A65100 001
"34000 010 B66100 010
#35000 011 C67100 011
$36000 100 D68100 100
%37000 101 E69100 101
&38000 110 F70100 110
'39000 111 G71100 111
(40001 000 H72101 000
)41001 001 I73101 001
*42001 010 J74101 010
+43001 011 K75101 011
,44001 100 L76101 100
-45001 101 M77101 101
.46001 110 N78101 110
/47001 111 O79101 111
048010 000 P80110 000
149010 001 Q81110 001
250010 010 R82110 010
351010 011 S83110 011
452010 100 T84110 100
553010 101 U85110 101
654010 110 V86110 110
755010 111 W87110 111
856011 000 X88111 000
957011 001 Y89111 001
:58011 010 Z90111 010
;59011 011 [91111 011
<60011 100 align="center"92111 100
=61011 101 ]93111 101
>62011 110 ^94111 110
?63011 111 _95111 111
 `96(1) 000 000

Decoding

You can decode the values by using the following schedule:

>
Read 4 bytes: b0, b1, b2, b3 
            a = (((b0 - 0x20) & 0x3F) << 2 & 0xFC) | (((b1 - 0x20) & 0x3F) >> 4 & 0x03);            
            b = (((b1 - 0x20) & 0x3F) << 4 & 0xF0) | (((b2 - 0x20) & 0x3F) >> 2 & 0x0F);            
            c = (((b2 - 0x20) & 0x3F) << 6 & 0xC0) |  ((b3 - 0x20) & 0x3F);            
Write a,b,c


Note: For Java, you need to use '>>>' (unsigned right-shift operator) instead of '>>'

Trivia

Microsoft's E-mail-program Outlook Express once contained a flaw that it also accepts "begin ..." as start of UUEncoded attachments (i.e., not requiring octal encoded UNIX-style permissions). Especially in Usenet, where MIME is seldom used and plain text is preferred, some people would embed begin, space, space in their messages in order to maliciously hide the rest of the message from Outlook Express users (e.g., they configure their news-client to quote starting with the line "begin quote from xxx")[1].

See also

References

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

External links

  • GNU sharutils - The Free Software Foundation's sharutils bundle includes uuencode, uudecode, and others.
  • UUDeview - open-source program to encode/decode Base64, BinHex, uuencode, xxencode, etc. for Unix/Windows/DOS
  • UUENCODE-UUDECODE - open-source program to encode/decode created by Clem "Grandad" Dye
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of ASCII-printable characters. These encodings are necessary for transmission of data when the channel or the protocol only allows ASCII-printable characters,
..... Click the link for more information.
Unix (officially trademarked as UNIX®) is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs including Ken Thompson, Dennis Ritchie and Douglas McIlroy.
..... Click the link for more information.
In communications, a code is a rule for converting a piece of information (for example, a letter, word, or phrase) into another form or representation, not necessarily of the same type.
..... Click the link for more information.
binary numeral system, or base-2 number system, is a numeral system that represents numeric values using two symbols, usually 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2.
..... Click the link for more information.
UUCP stands for Unix to Unix CoPy. The term generally refers to a suite of computer programs and protocols allowing remote execution of commands and transfer of files, email and netnews between computers.
..... Click the link for more information.
A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes referred to as code page) with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in
..... Click the link for more information.
E-mail (short for electronic mail; often also abbreviated as e-mail, email or simply mail) is a store and forward method of composing, sending, storing, and receiving messages over electronic communication systems.
..... Click the link for more information.
Usenet (USEr NETwork) is a global, decentralized, distributed Internet discussion system that evolved from a general purpose UUCP architecture of the same name. It was conceived by Duke University graduate students Tom Truscott and Jim Ellis in 1979.
..... Click the link for more information.
Mime or pantomime is a theatrical medium or performance art, involving the acting out of a story by a mime artist through body motions, without use of speech.

History


..... Click the link for more information.
yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit Extended ASCII encoding method.
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.
POSIX (IPA: /ˈpɒsɪks/) or "Portable Operating System Interface" [1] is the collective name of a family of related standards specified by the IEEE to define the application programming interface (API) for
..... Click the link for more information.

..... Click the link for more information.
Extended Binary Coded Decimal Interchange Code (EBCDIC) is an 8-bit character encoding (code page) used on IBM mainframe operating systems, like z/OS, OS/390, VM and VSE, as well as IBM minicomputer operating systems like OS/400 and i5/OS (see also Binary Coded Decimal).
..... Click the link for more information.
Xxencode is an obsolete binary-to-text encoding similar to Uuencode which uses only the alphanumeric characters, and the plus and minus signs. It was invented as a means to transfer files in a format which would survive character set translation.
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.

..... Click the link for more information.
Outlook Express is an e-mail/news client that was included with several versions of Microsoft Windows, starting with Windows 98 through the release of Windows XP. Outlook Express was also bundled with Internet Explorer 4.0, and available for Windows 95 and the classic Mac OS 9.
..... Click the link for more information.
Base64 is a positional notation using a base of 64. It is the largest power-of-two base that can be represented using single printable ASCII characters. This has led to its use as a transfer encoding for e-mail among other things.
..... Click the link for more information.
BinHex

File extension: .hqx
MIME type: application/mac-binhex40
application/mac-binhex
application/binhex

Uniform Type Identifier: com.apple.
..... Click the link for more information.
Mime or pantomime is a theatrical medium or performance art, involving the acting out of a story by a mime artist through body motions, without use of speech.

History


..... Click the link for more information.
Xxencode is an obsolete binary-to-text encoding similar to Uuencode which uses only the alphanumeric characters, and the plus and minus signs. It was invented as a means to transfer files in a format which would survive character set translation.
..... Click the link for more information.
yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit Extended ASCII encoding method.
..... Click the link for more information.
GNU Free Documentation License (GNU FDL or simply GFDL) is a copyleft license for free documentation, designed by the Free Software Foundation (FSF) for the GNU project.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter