Information about Base64
| Numeral systems by culture | |
|---|---|
| Hindu-Arabic numerals | |
| Western Arabic Eastern Arabic Khmer | Indian family Brahmi Thai |
| East Asian numerals | |
| Chinese Chinese counting rods | Korean Japanese |
| Alphabetic numerals | |
| Abjad Armenian Cyrillic Ge'ez | Hebrew Ionian/Greek Sanskrit |
| Other systems | |
| Attic Etruscan Urnfield Roman | Babylonian Egyptian Mayan |
| List of numeral system topics | |
| Positional systems by base | |
| Decimal (10) | |
| 2, 4, 8, 16, 32, 64 | |
| 3, 9, 12, 24, 30, 36, 60, | |
Base 64 encoding schemes
Privacy-Enhanced Mail (PEM)
The first known use of Base 64 encoding for electronic data transfer was the Privacy-enhanced Electronic Mail (PEM) protocol, proposed by RFC 989 in 1987. PEM defines a "printable encoding" scheme that uses Base 64 encoding to transform an arbitrary sequence of octets to a format that can be expressed in short lines of 7-bit characters, as required by transfer protocols such as SMTP.The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols. The "=" symbol is also used as a special suffix code. The original specification, RFC 989, additionally used the "*" symbol to delimit encoded but unencrypted data within the output stream.
To convert data to PEM printable encoding, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", and the indicated character is output.
The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.
After encoding padded data, if two octets were remaining to encode, one "=" character is appended to the output; if one octet was remaining, two "=" characters are appended. This signals the decoder that the zero bits added due to padding should not be emitted in the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.
PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.
MIME
The MIME (Multipurpose Internet Mail Extensions) specification, defined in RFC 2045, lists "base64" as one of several binary-to-text encoding schemes. MIME's base64 encoding is based on that of the RFC 1421 version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM, and uses the "=" symbol for output padding in the same way.MIME does not specify a fixed length for base64-encoded lines, but it does specify a maximum length of 76 characters. Additionally it specifies that any extra-alphabetic characters must be ignored by a compliant decoder, although most implementations use a
CR/LF newline pair to delimit encoded lines.
Thus, the actual length of MIME-compliant base64-encoded binary data is usually about 137% of the original data length, though for very short messages the overhead can be a lot higher because of the overhead of the headers. Very roughly, the final size of base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers).
UTF-7
UTF-7, described in RFC 2152, introduced a system called Modified Base64. This data encoding scheme is used to encode UTF-16 as ASCII characters for use in 7-bit transports such as SMTP. It is a variant of the base64 encoding used in MIME.The "Modified Base64" alphabet consists of the MIME base64 alphabet, but does not use the "=" padding character. UTF-7 is intended for use in mail headers (defined in RFC 2047), and the "=" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified base64 simply omits the padding and ends immediately after the last BASE64 digit containing useful bits (leaving 0-4 unused bits in the last base64 digit)
OpenPGP
OpenPGP, described in RFC 2440, describes Radix-64 encoding, also known as "ASCII Armor". Radix-64 is identical to the "base64" encoding described from MIME, with the addition of a 24-bit CRC checksum. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same base64 algorithm and, using an additional "=" symbol as separator, concatenated to the encoded output data.RFC 3548
RFC 3548 (The Base16, Base32, and Base64 Data Encodings) is an informational (non-normative) memo that attempts to unify the RFC 1421 and RFC 2045 specifications of base64 encodings, alternative-alphabet encodings, and the seldom-used Base 32 and Base 16 encodings.RFC 3548 forbids implementations from adding non-alphabetic characters unless they are written to a specification that refers to RFC 3548 and specifically requires otherwise; it also declares that decoder implementations must reject data that contains non-alphabetic characters unless they are written to a specification that refers to RFC 3548 and specifically requires otherwise.
RFC 4648
This RFC obsoletes RFC 3548 and focuses on base 64/32/16:- This document describes the commonly used base 64, base 32, and base 16 encoding schemes. It also discusses the use of line-feeds in encoded data, use of padding in encoded data, use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings.
Example
A quote from Thomas Hobbes's Leviathan:- Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.
is encoded in MIME's base64 scheme as follows:
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
In the above quote the encoded value of Man is TWFu. Encoded in ASCII, M, a, n are stored as the bytes
77, 97, 110, which are 01001101, 01100001, 01101110 in base 2. These three bytes are joined together in a 24 bit buffer producing 010011010110000101101110. Packs of 6 bits (6 bits has a maximum of 64 different binary values) are converted into 4 numbers (24 = 6x4) which are then converted to their corresponding values in Base 64.
| Text content | M | a | n | |||||||||||||||||||||
| ASCII | 77 | 97 | 110 | |||||||||||||||||||||
| Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
| Index | 19 | 22 | 5 | 46 | ||||||||||||||||||||
| Base64-Encoded | T | W | F | u | ||||||||||||||||||||
As this example illustrates, Base 64 encoding converts 3 uncoded bytes (in this case, ASCII characters) into 4 encoded ASCII characters.
The example below illustrates how shortening the input changes the output padding:
Input ends with: carnal pleasure. Output ends with: c3VyZS4= Input ends with: carnal pleasure Output ends with: c3VyZQ== Input ends with: carnal pleasur Output ends with: c3Vy Input ends with: carnal pleasu Output ends with: c3U=
Implementation
The traditional (MIME) base64 encoding and decoding processes are fairly simple to implement. Here an example using Javascript is given, including the MIME/etc required line breaks at particular line lengths. It is worth noting however, that many base64 functions (e.g. in PHP) return base64 encrypted strings without the line breaks, as the line breaks can be inserted easily after encoding, and many times the base64 encoding is desired only for safely transferring data via XML or inserting into a database, etc -- times when the line breaks are known to be unnecessary and therefore undesirable. The newline inserting and removing in these functions here can easily be commented out (they are each only one line in the respective functions) if they are not needed.An array of the base 64 characters is necessary for encoding, such as:
>
var base64chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split("");
And decoding will require the inverse list (swap the indices for the values), such as:
>
var base64inv = {}; for (var i = 0; i < base64chars.length; i++) { base64inv[base64chars[i]] = i; }
Note that in real implementations, it is better to explicitly list the entire array/hash for each list above -- the one-liners here are given to demonstrate the idea as directly as possible, rather than being the ideal in practice.
The base64 encoding function:
>
function base64_encode (s)
{
// the result/encrypted string, the padding string, and the pad count
var r = ""; var p = ""; var c = s.length % 3;
// add a right zero pad to make this string a multiple of 3 characters
if (c > 0) { for (; c < 3; c++) { p += '='; s += "\0"; } }
// increment over the length of the string, three characters at a time
for (c = 0; c < s.length; c += 3) {
// we add newlines after every 76 output characters, according to the MIME specs
if (c > 0 && (c / 3 * 4) % 76 == 0) { r += "\r\n"; }
// these three 8-bit (ASCII) characters become one 24-bit number
var n = (s.charCodeAt(c) << 16) + (s.charCodeAt(c+1) << 8) + s.charCodeAt(c+2);
// this 24-bit number gets separated into four 6-bit numbers
n = [(n >>> 18) & 63, (n >>> 12) & 63, (n >>> 6) & 63, n & 63];
// those four 6-bit numbers are used as indices into the base64 character list
r += base64chars[n[0]] + base64chars[n[1]] + base64chars[n[2]] + base64chars[n[3]];
// add the actual padding string, after removing the zero pad
} return r.substring(0, r.length - p.length) + p;
}
The base64 decoding function:
>
function base64_decode (s)
{
// replace any incoming padding with a zero pad (the 'A' character is zero)
var p = (s.charAt(s.length-1) == '=' ? (s.charAt(s.length-2) == '='
? 'AA' : 'A') : ""); var r = ""; s = s.substr(0, s.length - p.length) + p;
// remove/ignore any characters not in the base64 characters list -- particularly newlines
s = s.replace(new RegExp('[^'+base64chars.join("")+']', 'g'), "");
// increment over the length of this encrypted string, four characters at a time
for (var c = 0; c < s.length; c += 4) {
// each of these four characters represents a 6-bit index in the base64 characters list
// which, when concatenated, will give the 24-bit number for the original 3 characters
var n = (base64inv[s.charAt(c)] << 18) + base64inv[s.charAt(c+3)] +
(base64inv[s.charAt(c+1)] << 12) + (base64inv[s.charAt(c+2)] << 6);
// split the 24-bit number into the original three 8-bit (ASCII) characters
r += String.fromCharCode((n >>> 16) & 255, (n >>> 8) & 255, n & 255);
// remove any zero pad that was added to make this a multiple of 24 bits
} return r.substring(0, r.length - p.length);
}
The above implementation is best with a language like Javascript that handles string concatenation of arbitrary length strings very efficiently. Other languages (e.g. C) will work much more efficiently by allocating memory for a new string/array of the appropriate size (the output string length is easily calculated from the input string at the very beginning) and then simply setting each character index, as opposed to concatenation.
On-line Tools
There are a variety of on-line Base64 tools such as:- On-line decoding and encoding text and files
- Decode Base64 encoded text
- Base64 Decoder
- On-line Base64 encoder and decoder
- On-line Base64 encode / decode
- Downloadable open source tool to encode or decode Base64 on Unix] or Win32]
- encode/decode a "base64" stream
URL applications
Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. Hibernate, a database persistence framework for Java objects, uses Base64 encoding to encode a relatively large unique id (generally 128-bit UUIDs) into a string for use as an HTTP parameter in HTTP forms or HTTP GET URLs. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in not only a compact way, but in a relatively unreadable one when trying to obscure the nature of data from a casual human observer.Using a URL-encoder on standard Base64, however, is inconvenient as it will translate the '+' and '/' characters into special '%XX' hexadecimal sequences ('+' = '%2B' and '/' = '%2F'). When this is later used with database storage or across heterogeneous systems, they will themselves choke on the '%' character generated by URL-encoders (because the '%' character is also used in ANSI SQL as a wildcard).
For this reason, a modified Base64 for URL variant exists, where no padding '=' will be used, and the '+' and '/' characters of standard Base64 are respectively replaced by '*' and '-', so that using URL encoders/decoders is no longer necessary and has no impact on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general.
Another variant called modified Base64 for regexps uses '!-' instead of '*-' to replace the standard Base64 '+/', because both '+' and '*' may be reserved for regular expressions (note that '[]' used in the IRCu variant above would not work in that context).
There are other variants that use '_-' or '._' when the Base64 variant string must be used within valid identifiers for programs, or '.-' for use in XML name tokens (Nmtoken), or even '_:' for use in more restricted XML identifiers (Name).
Other applications
Base64 can be used in a variety of contexts:- Thunderbird and Evolution both use Base64 to obscure e-mail passwords
- Base64 is often used as a quick but insecure shortcut to obscure secrets without incurring the overhead of cryptographic key management
- Spammers use Base64 to evade basic anti-spam tools, which often do not decode Base64 and therefore cannot detect keywords in encoded messages.
- Base64 is used to encode character strings in LDIF files
- Base64 is sometimes used to embed binary data in an XML file, using a syntax similar to <data encoding="base64">......</data> e.g.: Firefox's bookmarks.html.
- Base64 is also used when communicating with Fiscal Signature/Printing devices (usually, over COM or LPT ports) to minimize the delay when transferring receipt characters for signing.
See also
External links
- RFC 989 and RFC 1421 (Privacy Enhancement for Electronic Internet Mail)
- RFC 2045 (Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies)
- RFC 3548 and RFC 4648 (The Base16, Base32, and Base64 Data Encodings)
- Home of the Base64 specification, with an online decoder and C99 implementation
- Base64 primer tutorial with accompanying lecture slides
- http://gtools.org/tool/base64-encode-decode/ and http://www.infowares.com/base64.php
- Base64 implementation in Java
- Base64 implementation in JavaScript
numeral system (or system of numeration) is a framework where a set of numbers are represented by numerals in a consistent manner. It can be seen as the context that allows the numeral "11" to be interpreted as the binary numeral for three
..... Click the link for more information.
..... Click the link for more information.
Hindu-Arabic numeral system (also called Algorism) is a positional decimal numeral system documented from the 9th century.
The symbols (glyphs) used to represent the system are in principle independent of the system itself.
..... Click the link for more information.
The symbols (glyphs) used to represent the system are in principle independent of the system itself.
..... Click the link for more information.
Arabic numerals, known formally as Hindu-Arabic numerals, and also as Indian numerals, Hindu numerals, Western Arabic numerals, European numerals, or Western numerals, are the most common symbolic representation of numbers around the world.
..... Click the link for more information.
..... Click the link for more information.
The Eastern Arabic numerals (also called Arabic-Indic numerals, Arabic Eastern Numerals) are the symbols (glyphs) used to represent the Hindu-Arabic numeral system in conjunction with the Arabic alphabet in Egypt, Iran, Afghanistan, Pakistan and parts of India, and also in
..... Click the link for more information.
..... Click the link for more information.
Khmer numerals are the numerals used in the Khmer language of Cambodia. In informal spoken language one can ignore the last "sep" (30 to 90) and it is still understood.
e.g.
..... Click the link for more information.
e.g.
..... Click the link for more information.
symbols used in various modern Indian scripts for the numbers from zero to nine:
Variant 0 1 2 3 4 5 6 7 8 9 Used in
Eastern Nagari numerals ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ? Bengali language
Assamese language
..... Click the link for more information.
Variant 0 1 2 3 4 5 6 7 8 9 Used in
Eastern Nagari numerals ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ? Bengali language
Assamese language
..... Click the link for more information.
Brahmi numerals are an indigenous Indian numeral system attested from the 3rd century BCE (somewhat later in the case of most of the tens). They are the direct graphic ancestors of the modern Indic and Hindu-Arabic numerals.
..... Click the link for more information.
..... Click the link for more information.
Thai numerals (ตัวเลขไทย) are traditionally used in Thailand, although the Arabic numerals (also known as Western numerals) are more common.
..... Click the link for more information.
..... Click the link for more information.
This page contains Chinese text.
Without proper rendering support, you may see question marks, boxes, or other symbols instead of Chinese characters.
Without proper rendering support, you may see question marks, boxes, or other symbols instead of Chinese characters.
Numeral systems by culture
Hindu-Arabic numerals
Western Arabic
Eastern Arabic
Khmer Indian family
..... Click the link for more information.
Counting rods (Traditional Chinese: 籌; Simplified Chinese: 筹; Pinyin: chou2
..... Click the link for more information.
..... Click the link for more information.
- sset
- 여덟 권 yeodeolgwon (eight (books)) is pronounced like [여덜꿘] yeodeolkkwon
..... Click the link for more information.
Japanese numerals is the system of number names used in the Japanese language. The Japanese numerals in writing are entirely based on the Chinese numerals and the grouping of large numbers follow the Chinese tradition of grouping by 10,000.
..... Click the link for more information.
..... Click the link for more information.
Abjad numerals are a decimal numeral system which was used in the Arabic-speaking world prior to the use of the Hindu-Arabic numerals from the 8th century, and in parallel with the latter until Modern times.
..... Click the link for more information.
..... Click the link for more information.
Armenian numerals is a historic numeral system created using the majuscules (uppercase letters) of the Armenian alphabet.
There was no notation for zero in the old system, and the numeric values for individual letters were added together.
..... Click the link for more information.
There was no notation for zero in the old system, and the numeric values for individual letters were added together.
..... Click the link for more information.
Cyrillic numerals was a numbering system derived from the Cyrillic alphabet, used by South and East Slavic peoples. The system was used in Russia as late as the 1700s when Peter the Great replaced it with the Hindu-Arabic numeral system.
..... Click the link for more information.
..... Click the link for more information.
Hebrew numerals is a quasi-decimal alphabetic numeral system using the letters of the Hebrew alphabet.
In this system, there is no notation for zero, and the numeric values for individual letters are added together. Each unit (1, 2, ...
..... Click the link for more information.
In this system, there is no notation for zero, and the numeric values for individual letters are added together. Each unit (1, 2, ...
..... Click the link for more information.
Greek numerals are a system of representing numbers using letters of the Greek alphabet. They are also known by the names Milesian numerals, Alexandrian numerals, or alphabetic numerals.
..... Click the link for more information.
..... Click the link for more information.
Attic numerals were used by ancient Greeks, possibly from the 7th century BC. They were also known as Herodianic numerals because they were first described in a 2nd century manuscript by Herodian.
..... Click the link for more information.
..... Click the link for more information.
Etruscan numerals were used by the ancient Etruscans. The system was adapted from the Greek Attic numerals and formed the inspiration for the later Roman numerals.
Etruscan Decimal Symbol *
θu 1 I
ma? 5 ?
śar 10 X
muval? 50
..... Click the link for more information.
Etruscan Decimal Symbol *
θu 1 I
ma? 5 ?
śar 10 X
muval? 50
..... Click the link for more information.
/» and the fifths place with a stroke from the top-left to the bottom-right «\». The numbers from 1 = / to 29 = ////\\\\\ have been found.
..... Click the link for more information.
Interpretation
These embossed marks, unique in objects from the Bronze Age, were introduced in cast-iron molds and were not..... Click the link for more information.
Roman numerals is a numeral system originating in ancient Rome, adapted from Etruscan numerals. The system used in classical antiquity was slightly modified in the Middle Ages to produce the system we use today. It is based on certain letters which are given values as numerals.
..... Click the link for more information.
..... Click the link for more information.
Babylonian numerals were written in cuneiform, using a wedge-tipped reed stylus to make a mark on a soft clay tablet which would be exposed in the sun to harden to create a permanent record.
..... Click the link for more information.
..... Click the link for more information.
Egyptian numerals was a numeral system used in ancient Egypt. It was a decimal system, often rounded off to the higher power, written in hieroglyphs. The hieratic form of numerals stressed an exact finite series notation, being ciphered one:one onto the Egyptian alphabet.
..... Click the link for more information.
..... Click the link for more information.
Maya numerals is very simple. [1]
Addition is performed by combining the numeric symbols at each level:
If five or more dots result from the combination, five dots are removed and replaced by a bar.
..... Click the link for more information.
Addition is performed by combining the numeric symbols at each level:
If five or more dots result from the combination, five dots are removed and replaced by a bar.
..... Click the link for more information.
This is a list of numeral system topics (and "numeric representations"), by Wikipedia page. It does not systematically list computer formats for storing numbers (computer numbering formats). See also number names.
..... Click the link for more information.
..... Click the link for more information.
A positional notation or place-value notation system is a numeral system in which each position is related to the next by a constant multiplier, a common ratio, called the base or radix of that numeral system.
..... Click the link for more information.
..... Click the link for more information.
base or radix is usually the number of various unique digits, including zero, that a positional numeral system uses to represent numbers. For example, the decimal system, the most common system in use today, uses base ten, hence the maximum number a single digit will ever
..... Click the link for more information.
..... Click the link for more information.
decimal (base ten or occasionally denary) numeral system has ten as its base. It is the most widely used numeral system, perhaps because humans have four fingers and a thumb on each hand, giving a total of ten digits over both hands.
..... Click the link for more information.
..... Click the link for more information.
binary numeral system, or base-2 number system, is a numeral system that represents numeric values using two symbols, usually 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2.
..... Click the link for more information.
..... Click the link for more information.
Quaternary is the base-4 numeral system. It uses the digits 0, 1, 2 and 3 to represent any real number.
It shares with all fixed-radix numeral systems many properties, such as the ability to represent any real number with a canonical representation (almost unique) and the
..... Click the link for more information.
It shares with all fixed-radix numeral systems many properties, such as the ability to represent any real number with a canonical representation (almost unique) and the
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus