Information about Text File
For Texting Language, see .

A stylized iconic depiction of a CSV-formatted text file.
A text file is a generic description of a kind of computer file in a computer file system.[1] At this generic level of description, there are two kinds of computer files: 1) text files; and 2) binary files.[2] This broad two-level distinction is widely recognized and applied in computing, even though it can be misleading, and subject to differing interpretation.[3][4]
The most common basis for distinguishing text files from binary files depends on the how the underlying stored information is ultimately interpreted and processed by the operating system and associated programs.[5] Text files are usually interpreted as consisting solely of characters from a recognized character set. Well-known character sets include the ASCII character set and the Unicode character set.[6]
Components
Text files are files where most bytes (or short sequences of bytes) represent ordinary readable characters such as letters, digits, and punctuation (including spaces), and include some control characters such as tabs, line feeds and carriage returns. This simplicity allows a wide variety of programs to display their contents.Cryptography
The similar term plaintext is most commonly used in a cryptographic context and refers to unencrypted data; however, this unencrypted data does not necessarily have to be a text file. The similarity between the terms "plaintext" and "text file" sometimes causes confusion, especially among those new to computers, cryptography, or data communications.Encoding
Software use
Although text files are often meant for humans to read, they are also commonly used for data storage by computer programs. Text files have some advantages even for data storage because they avoid certain problems with binary files, such as endianness, padding bytes, or differences in the number of bytes in a machine word. Further, when data corruption occurs in a file used for data storage, it is far easier for a human to fix if it is a text file. As a bonus, it may be easier for the program to recover from the error, because text files are pretty verbose, while binary files are usually compact (it's said that text files have a low entropy rate). Damaging an amount of a text file destroys little information; damaging the same amount of a binary file destroys more information.A large drawback of plain text files is that there is no way for a program to reliably determine what encoding is used. A text editor may save its text file in UTF-8, but a compiler might expect its input in ISO 8859. Trying to compile the UTF-8 text file would cause confusion and errors. Some text formats (such as XML) have an in-band mechanism for specifying the encoding of the document, but most text files have no such mechanism. Some programs go to great lengths to "guess" the encoding by looking for patterns in the text file, but this guessing procedure is very difficult to specify correctly for all cases (see AI-complete).
Formats
MIME
Text files usually have the MIME type "text/plain", usually with additional information indicating an encoding. Prior to the advent of Mac OS X, the Mac OS system regarded the content of a file (the data fork) to be a text file when its resource fork indicated that the type of the file was "TEXT". Under the Windows operating system, a file is regarded as a text file if the suffix of the name of the file (the "extension") is "txt". However, many other suffixes are used for text files with specific purposes. For example, source code for computer programs is usually kept in text files that have file name suffixes indicating the programming language in which the source is written.ASCII
The ASCII standard allows ASCII-only plain text files (unlike most other file types) to be freely interchanged and readable on Unix, Macintosh, Microsoft Windows, DOS, and other systems. These differ in their preferred line ending convention (see Newline) and their interpretation of values outside the ASCII range (their character encoding).Other formats
Plain text is often used as a readable representation of other data that is not itself purely textual: for example, a formatted webpage is not plain text, but its HTML source is. Similarly, source code for computer programs is usually stored in text files, but is compiled into a binary form for execution..txt
.txt is a filename extension for files consisting of text with very little formatting (ex: no bolding or italics). This kind of text format is also called a plain text file to differentiate them from other kinds of binary files, which, at the time the distinction was made, were not supposed to have human readable text. The precise definition of the .txt format is not specified, but typically matches the format accepted by the system terminal or simple text editor. Files with the .txt extension can easily be read or opened by any program that reads text and, for that reason, are considered universal (or platform independent).Plain text versus .txt
It should be noted that not all systems use the .txt extension when creating plain text files. In particular, on Unix systems, where extensions are entirely optional, it's common to see text files with no extension at all, the most prominent example being theREADME file, present in many software packages. However, there's no difference between a plain text file with no extension and a .txt file. The term "plain text" is attributed to the contents of the file, while the term ".txt" is attributed to the file metadata (i.e. the extension).
Plain text variations
Since plain text is not a formally defined standard, the definition of the format of a plain text file is rather loose. The principle differences are in character sets and character encodings, and conventions about formatting characters semantics.The ASCII character set is the most common format for English-language text files, and is generally assumed to be the default file format in many situations. For accented and other non-ASCII characters, it is necessary to choose a character encoding. In many systems, this is chosen on the basis of the default locale setting on the computer it is read on. Common character encodings include ISO 8859-1 for many European languages, and BIG5 for Chinese.
Because many encodings have only a limited repertoire of characters, they are often only usable to represent text in a limited subset of human languages. Unicode is an attempt to create a common standard for representing all known languages, and most known character sets are subsets of the very large Unicode character set. Although there are multiple character encodings available for Unicode, the most common is UTF-8, which has the advantage of being backwards-compatible with ASCII: that is, every ASCII text file is also a UTF-8 text file with identical meaning.
Formatting characters
If one is using an old Macintosh, then the newline command is associated to the ASCII character number 13. If one is using Unix, then the ASCII character is number 10. If, instead, the person is using an IBM Mainframe, then he or she would be using EBCDIC format and next line would be number 15.Standard Windows .txt files
Microsoft MS-DOS and Windows use a common text file format, with each line of text separated by a two character combination: CR and LF, which have ASCII codes 13 and 10. It is common for the last line of text not to be terminated with a CR-LF marker, and many text editors (including Notepad) do not automatically insert one on the last line.Most Windows text files use a form of ANSI, OEM or Unicode encoding. What Windows terminology calls "ANSI encodings" are usually single-byte ISO-8859 encodings, except for in locales such as Chinese, Japanese and Korean that require double-byte character sets. ANSI encodings were traditionally used as default system locales within Windows, before the transition to Unicode. By contrast, OEM encodings, also known as MS-DOS code pages, were defined by IBM for use in the original IBM PC text mode display system. They typically include graphical and line-drawing characters common in full-screen MS-DOS applications. Newer Windows text files may use a Unicode encoding such as UTF-16LE or UTF-8.
Notes and references
1. ^ Lewis, John (2006). Computer Science Illuminated. Jones and Bartlett. ISBN 0763741493.
2. ^ (Lewis 2006)
3. ^ (Lewis 2006 p. 354)
4. ^ The distinction between "text files" and "binary files" can be misleading, because (ultimately) all files in a binary computer file system are stored as binary digits (or bits). The only meaningful distinction arises in how those bits are interpreted and processed by the operating system and any associated programs.
5. ^ (Lewis 2005)
6. ^ (Lewis 2005)
2. ^ (Lewis 2006)
3. ^ (Lewis 2006 p. 354)
4. ^ The distinction between "text files" and "binary files" can be misleading, because (ultimately) all files in a binary computer file system are stored as binary digits (or bits). The only meaningful distinction arises in how those bits are interpreted and processed by the operating system and any associated programs.
5. ^ (Lewis 2005)
6. ^ (Lewis 2005)
See also
- List of file formats
- File extensions
- ASCII
- EBCDIC
- Newline
- Text editor
- Unicode
- Plain text
- Binary file
External links
computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage.
..... Click the link for more information.
..... Click the link for more information.
file system (often also written as filesystem) is a method for storing and organizing computer files and the data they contain to make it easy to find and access them.
..... Click the link for more information.
..... Click the link for more information.
binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text.
..... Click the link for more information.
..... Click the link for more information.
Edison cylinder phonograph ca. 1899. The Phonograph cylinder is a storage medium. The phonograph may or may not be considered a storage device.]] A data storage device is a device for recording (storing) information (data).
..... Click the link for more information.
..... Click the link for more information.
character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.
An example of a character is a letter, numeral, or punctuation mark.
..... Click the link for more information.
An example of a character is a letter, numeral, or punctuation mark.
..... Click the link for more information.
A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes referred to as code page) with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in
..... Click the link for more information.
..... Click the link for more information.
American Standard Code for Information Interchange (ASCII), generally pronounced ask-ee IPA: /ˈæski/ ( [1] ), is a character encoding based on the English alphabet.
..... Click the link for more information.
..... Click the link for more information.
Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in any of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard
..... Click the link for more information.
..... Click the link for more information.
byte (pronounced /baɪt/) is a unit of measurement of information storage, most often consisting of eight bits. In many computer architectures it is a unit of memory addressing.
..... Click the link for more information.
..... Click the link for more information.
In computing and telecommunication, a control character or non-printing character is a code point (a number) in a character set that does not in itself represent a written symbol.
..... Click the link for more information.
..... Click the link for more information.
TAB as an abbreviation may refer to:
..... Click the link for more information.
- TAB (Romanian army), a Romanian amphibious armored personnel carrier
- Tactical Advance to Battle, a British Forces term for a long/forced march (orig.
..... Click the link for more information.
newline (also known as a line break or end-of-line / EOL character) is a special character or sequence of characters signifying the end of a line of text.
..... Click the link for more information.
..... Click the link for more information.
Originally, carriage return was the term for the control character in Baudot code on a teletypewriter for end of line return to beginning of line and did not include line feed.
..... Click the link for more information.
..... Click the link for more information.
plaintext is information used as input to an encryption algorithm; the output is termed ciphertext. The plaintext could be, for example, a diplomatic message, a bank transaction, an e-mail, a diary and so forth — any information that someone might want to prevent
..... Click the link for more information.
..... Click the link for more information.
Cryptography (or cryptology; derived from Greek κρυπτός kryptós "hidden," and the verb γράφω gráfo "write" or λεγειν legein
..... Click the link for more information.
..... Click the link for more information.
A character encoding consists of a code that pairs a sequence of characters from a given character set (sometimes referred to as code page) with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the storage of text in
..... Click the link for more information.
..... Click the link for more information.
American Standard Code for Information Interchange (ASCII), generally pronounced ask-ee IPA: /ˈæski/ ( [1] ), is a character encoding based on the English alphabet.
..... Click the link for more information.
..... Click the link for more information.
Extended Binary Coded Decimal Interchange Code (EBCDIC) is an 8-bit character encoding (code page) used on IBM mainframe operating systems, like z/OS, OS/390, VM and VSE, as well as IBM minicomputer operating systems like OS/400 and i5/OS (see also Binary Coded Decimal).
..... Click the link for more information.
..... Click the link for more information.
This article or section is in need of attention from an expert on the subject.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
A hyperlink, is a reference or navigation element in a document to another section of the same document or to another document that may be on a different website.
Hyperlinks are part of the foundation of the World Wide Web created by Tim Berners-Lee, but are not limited to
..... Click the link for more information.
Hyperlinks are part of the foundation of the World Wide Web created by Tim Berners-Lee, but are not limited to
..... Click the link for more information.
IMAGE (from Imager for Magnetopause-to-Aurora Global Exploration), or Explorer 78, was a NASA MIDEX mission that studied the global response of the Earth's magnetosphere to changes in the solar wind.
..... Click the link for more information.
..... Click the link for more information.
ISO 8859, more formally ISO/IEC 8859, is a joint ISO and IEC standard for 8-bit character encodings for use by computers. The standard is divided into numbered, separately published parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc.
..... Click the link for more information.
..... Click the link for more information.
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character sets containing a maximum of 94
..... Click the link for more information.
The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character sets containing a maximum of 94
..... Click the link for more information.
Microsoft Windows
Screenshot of Windows Vista Ultimate, the latest version of Microsoft Windows.
Company/developer: Microsoft Corporation
OS family: MS-DOS/9x-based, Windows CE, Windows NT
Source model: Closed source
..... Click the link for more information.
Screenshot of Windows Vista Ultimate, the latest version of Microsoft Windows.
Company/developer: Microsoft Corporation
OS family: MS-DOS/9x-based, Windows CE, Windows NT
Source model: Closed source
..... Click the link for more information.
Mac OS Roman is a character encoding primarily used by Mac OS to represent text. It encodes 256 characters, the first 128 of which are identical to ASCII, with the remaining characters including mathematical symbols, diacritics, and additional punctuation marks.
..... Click the link for more information.
..... Click the link for more information.
This article relates to both the original "Classic" Mac OS as well as Mac OS X, Apple's more recent operating system. See the Mac OS X article for information directly relating to this current Macintosh operating system.
..... Click the link for more information.
Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in any of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard
..... Click the link for more information.
..... Click the link for more information.
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII.
..... Click the link for more information.
..... Click the link for more information.
In computing, endianness is the byte (and sometimes bit) ordering in memory used to represent some kind of data. Typical cases are the order in which integer values are stored as bytes in computer memory (relative to a given memory addressing scheme) and the transmission order over
..... Click the link for more information.
..... Click the link for more information.
word" is a term for the natural unit of data used by a particular computer design. A word is simply a fixed-sized group of bits that are handled together by the machine. The number of bits in a word (the word size or word length
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus