Information about Iso 8859 1

ISO 8859-1, more formally cited as ISO/IEC 8859-1 is part 1 of ISO/IEC 8859, a standard character encoding of the Latin alphabet. It is less formally called as Latin-1. It was originally developed by the ISO, but later jointly maintained by the ISO and the IEC. The standard, when supplemented with additional character assignments (in the C0 and C1 ranges: 0x00 to 0x1F and 0x7F, and 0x80 to 0x9F), is the basis of two widely-used character maps known as ISO-8859-1 (note the extra hyphen) and Windows-1252.

In June 2004, the ISO/IEC working group responsible for maintaining eight-bit coded character sets disbanded and ceased all maintenance of ISO 8859, including ISO 8859-1, in order to concentrate on the Universal Character Set and Unicode. In computing applications, encodings that provide full UCS support (such as UTF-8 and UTF-16) are finding increasing favor over encodings based on ISO 8859-1.

Coverage

ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. This character encoding is used throughout The Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages.

Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted):

Modern languages with complete coverage of their alphabet:
Languages commonly supported with nearly complete coverage of their alphabet
  • Dutch (missing IJ, ij but these should always be represented as IJ or ij in electronic form)
  • Estonian (missing Š, š, Ž, ž for loan words)
  • Note that Windows-1252 and ISO-8859-15 do contain these
  • French (missing Œ, œ and the very rare Ÿ; they are generally replaced by 'OE' and 'oe' without the normally required ligature, and 'Y' without the diaeresis)
  • Note that Windows-1252 and ISO-8859-15 do contain these
  • Finnish (missing Š, š, Ž, ž for loan words)
  • Note that Windows-1252 and ISO-8859-15 do contain these
Coverage of punctuation signs and apostrophes
For some languages listed above the correct typographical quotation marks are missing, for only « », " ", and ' ' are included.

Also, this encoding does not provide the correct character for the apostrophe, and oriented single high quotation marks, although some texts use the spacing grave accent and spacing acute accent which are both part of ISO 8859-1, instead of the 6-shaped/9-shaped quotations marks or apostrophes (and this works reliably with some font styles where all these characters are displayed as slanted wedge glyphs).

See also: Alphabets derived from the Latin

History

ISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation in the popular VT220 terminal. It was developed within ECMA, the European Computer Manufacturers Association, and published in March 1985 as ECMA-94, by which name it is still sometimes known. The second edition of ECMA-94 (June 1986) also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification

Relationship to ISO/IEC 8859-15

Although ISO/IEC 8859-1 has enough characters for most French text, it is missing a few letters that are less common. It is also missing a single-glyph representation for the letter IJ, two Finnish letters used for transcription of some foreign names and in a few loanwords (Š and Ž), typographic quotation marks and dashes, and common symbols such as the euro sign (€) and dagger (†).

In order to provide some of these characters, ISO/IEC 8859-15 was developed as an update of ISO/IEC 8859-1. This required, however, the removal of some infrequently-used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, ¦, ¨, ´, ¸, ¼, ½, and ¾.

Code table

Since all 191 characters encoded by ISO/IEC 8859-1 are 'graphic' (ISO's term for characters that are not control codes) and are compatible with most web browsers, they can be shown as glyphs in the following table. Since the space, no-break space, and soft hyphen characters would not normally be visible, they are represented by abbreviations for their names. All other characters are represented literally. Row and column headings indicate the hexadecimal digit combinations to produce the eight-bit code value; e.g., the letter L is at code value 4C.

ISO/IEC 8859-1
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF
0x unused
1x
2x SP!"#$%&'()*+,-./
3x0123456789:;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ[\]^_
6x`abcdefghijklmno
7xpqrstuvwxyz{ |}~
8x unused
9x
Ax NBSP¡¢£¤¥¦§¨©ª«¬SHY®¯
Bx°±²³´µ·¸¹º»¼½¾¿
CxÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
DxĞÑÒÓÔÕÖרÙÚÛÜİŞß
Exàáâãäåæçèéêëìíîï !Fx |ğ||ñ||ò||ó||ô||õ||ö||÷||ø||ù||ú||û||ü||ı||ş||ÿ


Code values 00–1F, 7F–9F are not assigned to characters by ISO/IEC 8859-1.

The lower range 20 to 7E (the G0 subset) maps exactly to the same coded G0 subset of the ISO 646 US variant (commonly known as ASCII), whose ISO 2022 standard switch sequence is "ESC ( B". The higher range A0 to FF (the G1 subset) maps exactly to the same subset initiated by the ISO 2022 standard switch sequence "ESC . A".

Related character maps

The ISO/IEC 8859-1 standard has long been the basis of a number of character maps, also known as character sets, charsets, or code pages, the most popular being ISO-8859-1 (note the extra hyphen) and Windows-1252. Both of these maps are a superset of ISO/IEC 8859-1; they supplement the standard's 191 character assignments by mapping additional characters to at least some portion of the code value ranges 00–1F, 7F, and 80–9F.

ISO-8859-1

In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the code values 00–1F, 7F, and 80–9F. It thus provides for 256 characters via every possible 8-bit value.

ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/". It is the default encoding of the values of certain descriptive HTTP headers, and is the standard encoding used by the X Window System on most Unix machines in locales which use that character set. It was also the basis of the repertoire of characters allowed in HTML 3.2 documents (HTML 4.0, however, is based on Unicode).

Escape sequences (from ISO/IEC 6429 or ISO/IEC 2022) are not to be interpreted in documents labeled as ISO-8859-1 encoded. As well as the canonical name and preferred MIME name mentioned above, the following other aliases are registered for ISO-8859-1: ISO_8859-1, ISO-8859-1, iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819. ISO-8859-1 was also incorporated as the first 256 code points of ISO/IEC 10646 and Unicode.

Code point Control character Abbreviation
00Null characterNUL
01Start of HeadingSOH
02Start of TextSTX
03End of TextETX
04End of TransmissionEOT
05EnquiryENQ
06AcknowledgeACK
07Bell characterBEL
08BackspaceBS
09TabTAB
0ALine FeedLF
0BVertical TabVT
0CForm FeedFF
0DCarriage ReturnCR
0EShift OutSO
0FShift InSI
10Data Link EscapeDLE
11Device Control 1DC1
12Device Control 2DC2
13Device Control 3DC3
14Device Control 4DC4
15Negative-acknowledge characterNAK
16Synchronous IdleSYN
17End of Transmission BlockETB
18Cancel characterCAN
19End of MediumEM
1ASubstitute (character)SUB
1BEscape characterESC
1CFile SeparatorFS
1DGroup SeparatorGS
1ERecord SeparatorRS
1FUnit SeparatorUS
7FDeleteDEL
 
Code point Control character Abbreviation
80Padding CharacterPAD
81High Octet PresetHOP
82Break Permitted HereBPH
83No Break HereNBH
84IndexIND
85Next LineNEL
86Start of Selected AreaSSA
87End of Selected AreaESA
88Character Tabulation SetHTS
89Character Tabulation with JustificationHTJ
8ALine Tabulation SetVTS
8BPartial Line ForwardPLD
8CPartial Line BackwardPLU
8DReverse Line FeedRI
8ESingle Shift 2SS2
8FSingle Shift 3SS3
90Device Control StringDCS
91Private Use 1PU1
92Private Use 2PU2
93Set Transmit StateSTS
94Cancel CharacterCCH
95Message WaitingMW
96Start of Guarded AreaSPA
97End of Guarded AreaEPA
98Start of StringSOS
99Single Graphic Character IntroducerSGCI
9ASingle Character IntroducerSCI
9BControl Sequence IntroducerCSI
9CString TerminatorST
9DOperating System CommandOSC
9EPrivacy MessagePM
9FApplication Program CommandAPC


Note that most of these control characters are not made for use in portable ISO-8859-1 encoded plain text documents, but only within specific protocols or devices, except a few ones whose behavior are standardized: TAB (09), LF (0A), CR (0D) and NEL (85); all but the first one are used to encode end of lines or to separate paragraphs, and TAB is often considered equivalent to whitespace. However FF (0C) is commonly accepted in some applications interpreting plain-text documents as an additional ignorable whitespace at the beginning of lines, to mark the position of an explicit page break when printing.

However, some encodings allow using BS (08) to create additional characters by emulating the superposition of multiple characters on printing devices.

Some ISO standards assign specific functions to some controls (for example in ISO 2022) where SO (0E), SI (0F), DLE (10), ESC (1B) and SS2 (8E) are used to control the encoding of characters after them or to switch between multiple encodings.

The NUL character (00) is commonly used as a string terminator in some programming languages, or as a filler in database records that must be ignored and is not part of the encoded text. STX (02) and ETX (03) are commonly used for delimiting frames in some transmission protocols. SUB (1A) is also commonly used as a replacement character to mark errors detected in input transmission streams, and it may be rendered graphically. DC1 (11) and DC3 (13) are commonly used in the XON/XOFF protocol for controlling the transmission speed. Finally, EM (19) or EOT (04) may be used as an end-of-file marker in some text file formats.

The ISO-8859-1/Windows-1252 mixup

It is very common to mislabel text data with the charset label ISO-8859-1, even though the data is really Windows-1252 encoded. In Windows-1252, codes between 0x80 and 0x9F are used for letters and punctuation, whereas they are control codes in ISO-8859-1. Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters in order to accommodate such mislabeling but it is not a standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content.

Similar character sets

The Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman, in 1984. It was meant to be suitable for Western European desktop publishing. It is a superset of ASCII, like ISO-8859-1, and has most of the characters that are in ISO-8859-1 but in a totally different arrangement. A later version, registered with IANA as "Macintosh", replaced the generic currency sign ¤ with the euro sign €. The few printable characters that are in ISO 8859-1 but not in this set are often a source of trouble when editing text on websites using older Macintosh browsers (including the last version of Internet Explorer for Mac). However the extra characters that Windows-1252 has in the C1 codepoint range are all supported in MacRoman and except for the few missing ISO-8859-1 characters a Macintosh can send/receive files (and email) that are encoded/marked as ISO-8859-1 (with the C1 Control Characters) and Windows-1252 by remapping the glyph's codepoint numbers.

DOS had code page 850, which had all printable characters that ISO-8859-1 had (albeit in a totally different arrangement) plus the most widely used graphics characters from code page 437.

See also

External links

ISO 8859, more formally ISO/IEC 8859, is a joint ISO and IEC standard for 8-bit character encodings for use by computers. The standard is divided into numbered, separately published parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc.
..... Click the link for more information.
A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes referred to as code page) with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in
..... Click the link for more information.
Latin alphabet
Child systems Numerous: see Alphabets derived from the Latin
Sister systems Cyrillic
Coptic
Armenian
Runic/Futhark
Unicode range See Latin characters in Unicode
ISO 15924 Latn

Note
..... Click the link for more information.
International Organization for Standardization (Organisation internationale de normalisation), widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations.
..... Click the link for more information.
The International Electrotechnical Commission[1] (IEC) is a not-for-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known
..... Click the link for more information.
Windows-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages. In LaTeX packages, it is referred to as ansinew.
..... Click the link for more information.
20th century - 21st century - 22nd century
1970s  1980s  1990s  - 2000s -  2010s  2020s  2030s
2001 2002 2003 - 2004 - 2005 2006 2007

2004 by topic:
News by month
Jan - Feb - Mar - Apr - May - Jun
..... Click the link for more information.
BIT is an acronym for:
  • Bannari amman Institute of Technology
  • Bangalore Institute of Technology
  • Beijing Institute of Technology
  • Benzisothiazolinone
  • Bilateral Investment Treaty
  • Bhilai Institute of Technology - Durg

..... Click the link for more information.
The international standard ISO/IEC 10646 defines the Universal Character Set (UCS) as a character set on which many encodings are based. It contains nearly a hundred thousand abstract characters, each identified by an unambiguous name and an integer number called its
..... Click the link for more information.
Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in any of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard
..... Click the link for more information.
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII.
..... Click the link for more information.
Latin alphabet
Child systems Numerous: see Alphabets derived from the Latin
Sister systems Cyrillic
Coptic
Armenian
Runic/Futhark
Unicode range See Latin characters in Unicode
ISO 15924 Latn

Note
..... Click the link for more information.
character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.

An example of a character is a letter, numeral, or punctuation mark.
..... Click the link for more information.
writing system is a type of symbolic system used to represent elements or statements expressible in language.

General properties

Writing systems are distinguished from other possible symbolic communication systems in that one must usually understand something of the
..... Click the link for more information.
Americas are the lands of the Western hemisphere or New World consisting of the continents of North America[1] and South America with their associated islands and regions. The Americas cover 8.3% of the Earth's total surface area (28.
..... Click the link for more information.
Western Europe is mainly a socio-political concept forged during the Cold War, which largely defined its borders. Its boundaries were effectively forged during the final stages of World War II and came to encompass all European countries which did not come under Soviet control and
..... Click the link for more information.
Oceania (sometimes Oceanica) is a geographical, often geopolitical, region consisting of numerous lands—mostly islands in the Pacific Ocean and vicinity. The exact scope of Oceania is defined variously, with interpretations often including Australia, New Zealand, New
..... Click the link for more information.
Africa is the world's second-largest and second most-populous continent, after Asia. At about 30,221,532 km² (11,668,545 sq mi) including adjacent islands, it covers 6% of the Earth's total surface area, and 20.4% of the total land area.
..... Click the link for more information.
Afrikaans}}} 
Official status
Official language of:
The template is . Please use instead.
This usage is deprecated. Please replace it with .
'''The template is deprecated. Please use instead.
..... Click the link for more information.
Albanian (gjuha shqipe IPA /ˈɟuˌha ˈʃciˌpɛ/
..... Click the link for more information.

 Basque
}}} 
Official status
Official language of: Euskadi and Navarre (Spain)
Regulated by: Euskaltzaindia
Language codes
ISO 639-1: eu
ISO 639-2: baq (B)  eus (T)
ISO 639-3: eus


..... Click the link for more information.
Breton}}}
Language codes
ISO 639-1: br
ISO 639-2: bre
ISO 639-3: bre Breton (Brezhoneg) is a Celtic language spoken by some of the inhabitants of Brittany (Breizh) in France.
..... Click the link for more information.
In Spain: Catalonia, Valencian Community, Balearic Islands, Aragon (in La Franja), Murcia (in El Carxe). In France: Northern Catalonia. In Italy: The city of L'Alguer. In Andorra.
Total speakers: 9.
..... Click the link for more information.
Danish}}} 
Official status
Official language of:  Denmark
 Greenland
 Faroe Islands
 European Union
Nordic Council
Regulated by: Dansk Sprognævn ("Danish Language Committee")
Language codes
ISO 639-1: da
ISO 639-2:
..... Click the link for more information.
English}}} 
Writing system: Latin (English variant) 
Official status
Official language of: 53 countries
Regulated by: no official regulation
Language codes
ISO 639-1: en
ISO 639-2: eng
ISO 639-3: eng  
..... Click the link for more information.
Faroese}}} 
Official status
Official language of:  Faroe Islands
Regulated by: Føroyska málnevndin
Language codes
ISO 639-1: fo
ISO 639-2: fao
ISO 639-3: fao  
Faroese (føroyskt
..... Click the link for more information.
Galician}}} 
Official status
Official language of: Galicia, Spain; accepted orally as Portuguese by the European Union Parliament.
Regulated by: Real Academia Galega
Language codes
ISO 639-1: gl
ISO 639-2: glg
ISO 639-3: glg
..... Click the link for more information.
German language (Deutsch, ] ) is a West Germanic language and one of the world's major languages.
..... Click the link for more information.
Icelandic}}} 
Writing system: Latin (Icelandic variant) 
Official status
Official language of:  Iceland
Regulated by: Árni Magnússon Institute for Icelandic Studies
Language codes
ISO 639-1: is
ISO 639-2: ice (B) 
..... Click the link for more information.
Irish}}} 
Writing system: Latin (Irish variant) 
Official status
Official language of: Republic of Ireland
Northern Ireland
European Union
Regulated by: Foras na Gaeilge
Language codes
ISO 639-1: ga
ISO 639-2: gle
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter