Information about Whitespace (computer Science)



In computer science, white space, whitespace, or a whitespace character is any single character which represents horizontal or vertical space in typography, or is a series of such characters.

When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page. For example, the common whitespace symbol " " (the Unicode character at the 32nd code point) represents a blank space, as used between words and sentences in Western scripts.

As is common in technical literature, the two words "white space" have found widespread usage as the single term "whitespace", especially when used as an adjective, as in "whitespace character". Some specifications refer to "white space" while others refer to "whitespace"; there is no difference between the terms, although exactly which characters are being referred to does vary from context to context. For example, in HTML, "whitespace" includes the form feed character, while in XML, "white space" does not.

The most common whitespace characters may be typed via the space bar or the Tab key. Depending on context, a line-break generated by the Return key (Enter key) may be considered whitespace as well.

Runs of whitespace occurring within source code written in computer programming languages are generally ignored; such languages are free-form. But, for example, in Haskell and Python, whitespace and indentation are used for syntactical purposes.

In many programming languages, abundant use of whitespace, especially trailing whitespace at the end of lines, is considered a nuisance. In interpreted languages, parsing of unnecessary whitespace may affect the speed of execution. In markup languages like HTML, unnecessary whitespace increases the file size, and may so affect the speed of transfer over a network.

On the other hand, unnecessary whitespace can also inconspicuously mark code, similar to, but less obvious than comments in code. This can be desirable to prove an infringement of license or copyright, which was committed by copying and pasting.

The C language defines whitespace to be "... space, horizontal tab, new-line, vertical tab, and form-feed".

The HTTP network protocol has very strict requirements about what type of whitespace can occur in the control structures (such as the header fields) and where it must and must not occur.

The term whitespace is based on the assumption that the background color used for rendered text is white, and is thus confusing if it is not.

On some occasions, such as a textbook on the Modula-2 computer language published ca. 1985 by Springer-Verlag, it is necessary to explicitly show a symbol to indicate a space code. As the Wikipedia article on the interpunct points out, many different characters (in computer-readable form) create white space.

That book, at least, used the symbol ␣ (Unicode U+2423, decimal 9251, OPEN BOX) to show an explicit space code. (In case it doesn't render well on a monitor screen, it's like a ] (closing square bracket) rotated a quarter-turn clockwise, although not as wide, and placed below the writing line. Some fonts render it too narrowly.)

Such usage is similar to multiword file names written for operating systems and applications that are confused by embedded space codes -- such file names instead use a low line (_) as a word separator, as_in_this_phrase.

Another such symbol was ␢ (Unicode U+2422, decimal 9250, LATIN SMALL B WITH STROKE). This was used in the early years of computer programming (especially by IBM?) when writing on coding forms. Keypunch operators immediately recognized the symbol as an "explicit space".

Unicode

In Unicode (Unicode Character Database) the following codepoints are defined as whitespace:
  • U0009-U000D (Control characters, containing TAB, CR and LF)
  • U0020 SPACE
  • U0085 NEL
  • U00A0 NBSP
  • U1680 OGHAM SPACE MARK
  • U180E MONGOLIAN VOWEL SEPARATOR
  • U2000-U200A (different sorts of spaces)
  • U2028 LSP
  • U2029 PSP
  • U202F NARROW NBSP
  • U205F MEDIUM MATHEMATICAL SPACE
  • U3000 IDEOGRAPHIC SPACE

See also



External Links

Propertylist of Unicode Character Database
White space, commonly called whitespace in technical fields, may mean:
  • White space (visual arts), or negative space, the portions of a page left unmarked
  • Whitespace (computer science), characters used to represent white space in text

..... Click the link for more information.
Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems.
..... Click the link for more information.
character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.

An example of a character is a letter, numeral, or punctuation mark.
..... Click the link for more information.
Typography is the art and techniques of type design, modifying type glyphs, and arranging type. Type glyphs (characters) are created and modified using a variety of illustration techniques.
..... Click the link for more information.
Symbols are objects, characters, or other concrete representations of ideas, concepts, or other abstractions. For example, in the United States, Canada and Great Britain, a red octagon is a symbol for the traffic sign meaning "STOP".
..... Click the link for more information.
Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in any of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard
..... Click the link for more information.
The international standard ISO/IEC 10646 defines the Universal Character Set (UCS) as a character set on which many encodings are based. It contains nearly a hundred thousand abstract characters, each identified by an unambiguous name and an integer number called its
..... Click the link for more information.
In writing, a space ( ) is any empty (non-written) zone between written sections. However, the term is usually used to refer to an empty zone used for interword separation (interword space) or separation between punctuation and words.
..... Click the link for more information.
writing system is a type of symbolic system used to represent elements or statements expressible in language.

General properties

Writing systems are distinguished from other possible symbolic communication systems in that one must usually understand something of the
..... Click the link for more information.
    In grammar, an adjective is a word whose main syntactic role is to modify a noun or pronoun (called the adjective's subject), giving more information about what the noun or pronoun refers to.
    ..... Click the link for more information.
    HTML (Hypertext Markup Language)

    File extension: .html, .htm
    MIME type: text/html
    Type code: TEXT
    ..... Click the link for more information.
    A page break is a marker in an electronic document, which tells the document interpreter that the contents which follows is part of a new page. A page break causes a form feed, to be sent to the printer, during spooling of the document to the printer.
    ..... Click the link for more information.
    space bar (or spacebar) is a key on an alphanumeric keyboard in the form of a horizontal bar in the lowermost row, significantly wider than other keys. Its main purpose is to conveniently enter the space, e.g., between words during typing.
    ..... Click the link for more information.
    tab key on a keyboard is used to advance the cursor to the next "tab stop".

    Origin

    Tab is the abbreviation of Tabulator. To tabulate (tabulating) means putting something into a table or chart.
    ..... Click the link for more information.
    The enter key (or the return key) in most cases causes a form or dialog box to operate its default function, which is typically to finish an "entry" and begin the obviously desired process.
    ..... Click the link for more information.
    The enter key (or the return key) in most cases causes a form or dialog box to operate its default function, which is typically to finish an "entry" and begin the obviously desired process.
    ..... Click the link for more information.
    source code (commonly just source or code) is any sequence of statements and/or declarations written in some human-readable computer programming language.
    ..... Click the link for more information.
    A programming language is an artificial language that can be used to control the behavior of a machine, particularly a computer. Programming languages, like natural languagess, are defined by syntactic and semantic rules which describe their structure and meaning respectively.
    ..... Click the link for more information.
    In computer programming, a free-form language is a programming language in which the positioning of characters on the page in program text is not significant. Program text does not need to be placed in specific columns as on old punched card systems, and frequently ends of lines
    ..... Click the link for more information.
    Haskell

    Paradigm: functional, non-strict, modular
    Appeared in: 1990
    Designed by: Simon Peyton-Jones, Paul Hudak[1], Philip Wadler, et al
    Typing discipline: static, strong, inferred
    Major implementations: GHC, Hugs, NHC , JHC , Yhc
    ..... Click the link for more information.
    Python

    Paradigm: Multi-paradigm
    Appeared in: 1991
    Designed by: Guido van Rossum
    Developer: Python Software Foundation
    Latest release: 2.5.1/ April 18 2007
    Latest unstable release: 3.
    ..... Click the link for more information.
    In computer programming an interpreted language is a programming language whose implementation often takes the form of an interpreter. Theoretically, any language may be compiled or interpreted, so this designation is applied purely because of common implementation practice and not
    ..... Click the link for more information.
    markup language provides a way to combine a text and extra information about it. The extra information, including structure, layout, or other information, is expressed using markup, which is typically intermingled with the primary text.
    ..... Click the link for more information.
    HTML (Hypertext Markup Language)

    File extension: .html, .htm
    MIME type: text/html
    Type code: TEXT
    ..... Click the link for more information.
    Infringement, when used alone, has several possible meanings in the English language.

    In a legal context, an infringement refers to the violation of a law or a right.
    ..... Click the link for more information.
    cut and paste and copy and paste offer user-interface paradigms for transferring text, data, files or objects from a source to a destination. Most ubiquitously, users require the ability to cut and paste sections of plain text.
    ..... Click the link for more information.
    C

    The C Programming Language, Brian Kernighan and Dennis Ritchie, the original edition that served for many years as an informal specification of the language.
    ..... Click the link for more information.
    Hypertext Transfer Protocol (HTTP) is a communications protocol used to transfer or convey information on the World Wide Web. Its original purpose was to provide a way to publish and retrieve HTML hypertext pages.
    ..... Click the link for more information.
    An interpunct · is a small dot used for interword separation in ancient Latin script, being perhaps the first consistent visual representation of word boundaries in written language. The dot is vertically centered, e.g.
    ..... Click the link for more information.
    Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in any of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard
    ..... Click the link for more information.


    This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
    Herod_Archelaus


    page counter