Information about Optical Character Recognition
| Because of technical limitations, some web browsers may not display some in this article. |
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text.
OCR is a field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques. Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Because very few applications survive that use true optical techniques, the OCR term has now been broadened to include digital image processing as well.
Early systems required training (the provision of known samples of each character) to read a specific font. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common. Some systems are even capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.
History
In 1929, Gustav Tauschek obtained a patent on OCR in Germany, followed by Handel who obtained a US patent on OCR in USA in 1933 (U.S. Patent 1,915,993). In 1935 Tauschek was also granted a US patent on his method (U.S. Patent 2,026,329).Tauschek's machine was a mechanical device that used templates. A photodetector was placed so that when the template and the character to be recognised were lined up for an exact match and a light was directed towards them, no light would reach the photodetector.
In 1950, David Shepard, a cryptanalyst at the Armed Forces Security Agency in the United States, was asked by Frank Rowlett, who had broken the Japanese PURPLE diplomatic code, to work with Dr. Louis Tordella to recommend data automation procedures for the Agency. This included the problem of converting printed messages into machine language for computer processing. Shepard decided it must be possible to build a machine to do this, and, with the help of Harvey Cook, a friend, built "Gismo" in his attic during evenings and weekends. This was reported in the Washington Daily News on 27 April 1951 and in the New York Times on 26 December 1953 after his U.S. Patent Number 2,663,758 was issued. Shepard then founded Intelligent Machines Research Corporation (IMR), which went on to deliver the world's first several OCR systems used in commercial operation. While both Gismo and the later IMR systems used image analysis, as opposed to character matching, and could accept some font variation, Gismo was limited to reasonably close vertical registration, whereas the following commercial IMR scanners analyzed characters anywhere in the scanned field, a practical necessity on real world documents.
The first commercial system was installed at the Readers Digest in 1955, which, many years later, was donated by Readers Digest to the Smithsonian, where it was put on display. The second system was sold to the Standard Oil Company of California for reading credit card imprints for billing purposes, with many more systems sold to other oil companies. Other systems sold by IMR during the late 1950s included a bill stub reader to the Ohio Bell Telephone Company and a page scanner to the United States Air Force for reading and transmitting by teletype typewritten messages. IBM and others were later licensed on Shepard's OCR patents.
The United States Postal Service has been using OCR machines to sort mail since 1965 based on technology devised primarily by the prolific inventor Jacob Rabinow. The first use of OCR in Europe was by the British General Post Office or GPO. In 1965 it began planning an entire banking system, the National Giro, using OCR technology, a process that revolutionized bill payment systems in the UK. Canada Post has been using OCR systems since 1971. OCR systems read the name and address of the addressee at the first mechanized sorting center, and print a routing bar code on the envelope based on the postal code. After that the letters need only be sorted at later centers by less expensive sorters which need only read the bar code. To avoid interference with the human-readable address field which can be located anywhere on the letter, special ink is used that is clearly visible under ultraviolet light. This ink looks orange in normal lighting conditions. Envelopes marked with the machine readable bar code may then be processed.
Current state of OCR technology
The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem. Typical accuracy rates exceed 99%, although certain applications demanding even higher accuracy require human review for errors. Handwriting recognition, including recognition of hand printing, cursive handwriting, is still the subject of active research, as is recognition of printed text in other scripts (especially those with a very large number of characters)Systems for recognizing hand-printed text on the fly have enjoyed commercial success in recent years. Among these are the input device for personal digital assistants such as those running Palm OS. The Apple Newton pioneered this technology. The algorithms used in these devices take advantage of the fact that the order, speed, and direction of individual lines segments at input are known. Also, the user can be retrained to use only specific letter shapes. These methods cannot be used in software that scans paper documents, so accurate recognition of hand-printed documents is still largely an open problem. Accuracy rates of 80% to 90% on neat, clean hand-printed characters can be achieved, but that accuracy rate still translates to dozens of errors per page, making the technology useful only in very limited applications. This variety of OCR is now commonly known in the industry as ICR, or Intelligent Character Recognition.
Recognition of cursive text is an active area of research, with recognition rates even lower than that of hand-printed text. Higher rates of recognition of general cursive script will likely not be possible without the use of contextual or grammatical information. For example, recognizing entire words from a dictionary is easier than trying to parse individual characters from script. Reading the Amount line of a cheque (which is always a written-out number) is an example where using a smaller dictionary can increase recognition rates greatly. Knowledge of the grammar of the language being scanned can also help determine if a word is likely to be a verb or a noun, for example, allowing greater accuracy. The shapes of individual cursive characters themselves simply do not contain enough information to accurately (greater than 98%) recognize all handwritten cursive script.
For more complex recognition problems, intelligent character recognition systems are generally used, as artificial neural networks can be made indifferent to both affine and non-linear transformations.[1]
Music OCR
Early research into recognition of printed sheet music was performed in the mid 1970s at MIT and other institutions. Successive efforts were made to localize and remove musical staff lines leaving symbols to be recognized and parsed. The first proprietary music-scanning program, MIDISCAN, was released in 1991. Three proprietary products are currently available. At this time, OCR software does not recognize handwritten scores.
Magnetic ink character recognition
One area where accuracy and speed of computer input of character information exceeds that of humans is in the area of magnetic ink character recognition, where the error rates range around one read error for every 20,000 to 30,000 checks.Optical Character Recognition in Unicode
In Unicode, Optical Character Recognition symbol characters are placed in the hexadecimal range 0x2440–0x245F, as shown below (see also Unicode Symbols):OCR software
- ABBYY FineReader OCR
- GOCR
- Falcon32
- IPStudio
- Microsoft Office Document Imaging
- NovoDynamics VERUS
- Ocrad
- Ocropus
- OmniPage
- Readiris
- ReadSoft
- SmartScore
- Tesseract (software)
- TopSoft TopOCR
See also
- Automatic number plate recognition
- CAPTCHA
- Computational linguistics
- Computer vision
- Machine learning
- Optical mark recognition
- Raster to vector
- Raymond Kurzweil
- SmartPen - optical character recognition technology system used in clinical trials
- Speech recognition
References
External links
- ICDAR, a comprehensive conference on all aspects of document recognition
- Linux OCR: A review of free optical character recognition software
A web browser is a software application that enables a user to display and interact with text, images, videos, music and other information typically located on a Web page at a website on the World Wide Web or a local area network.
..... Click the link for more information.
..... Click the link for more information.
Mechanical may refer to:
..... Click the link for more information.
- Mechanics, in physics, e.g.
- Classical mechanics
- Quantum mechanics
..... Click the link for more information.
Electronics is the study of the flow of charge through various materials and devices such as, semiconductors, resistors, inductors, capacitors, nano-structures, and vacuum tubes. All applications of electronics involve the transmission of power and possibly information.
..... Click the link for more information.
..... Click the link for more information.
IMAGE (from Imager for Magnetopause-to-Aurora Global Exploration), or Explorer 78, was a NASA MIDEX mission that studied the global response of the Earth's magnetosphere to changes in the solar wind.
..... Click the link for more information.
..... Click the link for more information.
In computing, a scanner is a device that analyzes images, printed text, or handwriting, or an object (such as an ornament) and converts it to a digital image. Most scanners today are variations of the desktop (or flatbed) scanner.
..... Click the link for more information.
..... Click the link for more information.
Pattern recognition is a sub-topic of machine learning. It can be defined as
..... Click the link for more information.
- "the act of taking in raw data and taking an action based on the category of the data".
..... Click the link for more information.
artificial intelligence (or AI) is "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions which maximizes its chances of success.
..... Click the link for more information.
..... Click the link for more information.
Machine vision (MV) is the application of computer vision to industry and manufacturing. Whereas computer vision is mainly focused on machine-based image processing, machine vision most often requires also digital input/output devices and computer networks to control other
..... Click the link for more information.
..... Click the link for more information.
Digital image processing is the use of computer algorithms to perform image processing on digital images. Digital image processing has the same advantages over analog image processing as digital signal processing has over analog signal processing — it allows a much wider
..... Click the link for more information.
..... Click the link for more information.
This article or section is in need of attention from an expert on the subject.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
Please help recruit one or [ improve this article] yourself. See the talk page for details.
..... Click the link for more information.
United States patent law was established "to promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;" as provided in the United States Constitution.
..... Click the link for more information.
..... Click the link for more information.
Photosensors or photodetectors are sensors of light or other electromagnetic energy. There are several varieties:
Most optical detectors are quantum devices in which an individual photon produces a discrete effect.
..... Click the link for more information.
Most optical detectors are quantum devices in which an individual photon produces a discrete effect.
..... Click the link for more information.
National Security Agency/Central Security Service (NSA/CSS) is the United States government's cryptologic organization that was officially established on November 4, 1952. Responsible for the collection and analysis of foreign communications, it coordinates, directs, and performs
..... Click the link for more information.
..... Click the link for more information.
Motto
"In God We Trust" (since 1956)
"E Pluribus Unum" ("From Many, One"; Latin, traditional)
Anthem
..... Click the link for more information.
"In God We Trust" (since 1956)
"E Pluribus Unum" ("From Many, One"; Latin, traditional)
Anthem
..... Click the link for more information.
Frank Byron Rowlett (May 2, 1908 - June 29, 1998) was an American cryptologist.
He was born in Rose Hill, Virginia and attended Emory & Henry College in Emory, Virginia, where he was a member of the Beta Lambda Zeta fraternity.
..... Click the link for more information.
He was born in Rose Hill, Virginia and attended Emory & Henry College in Emory, Virginia, where he was a member of the Beta Lambda Zeta fraternity.
..... Click the link for more information.
97-shiki ōbun inji-ki (九七式欧文印字機) ("System 97 Printing Machine for European Characters") or Angōki Taipu-B
..... Click the link for more information.
..... Click the link for more information.
April 27 is the 1st day of the year (2nd in leap years) in the Gregorian calendar. There are 0 days remaining.
..... Click the link for more information.
Events
- 1124 - David I becomes King of Scotland.
..... Click the link for more information.
19th century - 20th century - 21st century
1920s 1930s 1940s - 1950s - 1960s 1970s 1980s
1948 1949 1950 - 1951 - 1952 1953 1954
Year 1951 (MCMLI
..... Click the link for more information.
1920s 1930s 1940s - 1950s - 1960s 1970s 1980s
1948 1949 1950 - 1951 - 1952 1953 1954
Year 1951 (MCMLI
..... Click the link for more information.
December 26 is the 1st day of the year (2nd in leap years) in the Gregorian calendar. There are 0 days remaining.
..... Click the link for more information.
Events
- 1481 - Battle of Westbroek - Holland defeats troops of Utrecht.
..... Click the link for more information.
19th century - 20th century - 21st century
1920s 1930s 1940s - 1950s - 1960s 1970s 1980s
1950 1951 1952 - 1953 - 1954 1955 1956
Year 1953 (MCMLIII
..... Click the link for more information.
1920s 1930s 1940s - 1950s - 1960s 1970s 1980s
1950 1951 1952 - 1953 - 1954 1955 1956
Year 1953 (MCMLIII
..... Click the link for more information.
Intelligent Machines Research Corporation (IMR) was founded by David Shepard and William Lawless, Jr. in 1952 for the purpose of commercializing the work Shepard had done with the help of Harvey Cook in building "Gismo", a machine later called the "Analyzing Reader".
..... Click the link for more information.
..... Click the link for more information.
''' Reader's Digest is a monthly general interest family magazine. Although its circulation has declined in recent years, the Audit Bureau of Circulation says Reader's Digest
..... Click the link for more information.
..... Click the link for more information.
Smithsonian Institution (pronounced [smɪθ.ˈso.ni.ˌən]) is an educational and research institute and associated museum complex, administered and funded by the government of the United States and by funds
..... Click the link for more information.
..... Click the link for more information.
Standard Oil (Esso) was a predominant integrated oil producing, transporting, refining, and marketing company. Established in 1870 and operating as a major company trust until it was dissolved by the United States Supreme Court in 1911, it was one of the world's first and
..... Click the link for more information.
..... Click the link for more information.
Editing of this page by unregistered or newly registered users is currently disabled due to vandalism.
If you are prevented from editing this page, and you wish to make a change, please discuss changes on the talk page, request unprotection, log in, or .
..... Click the link for more information.
If you are prevented from editing this page, and you wish to make a change, please discuss changes on the talk page, request unprotection, log in, or .
..... Click the link for more information.
A credit card is a system of payment named after the small plastic card issued to users of the system. A credit card is different from a debit card in that it does not remove money from the user's account after every transaction.
..... Click the link for more information.
United States Air Force (USAF) is the aerial warfare branch of the United States armed forces and one of the seven uniformed services. Previously part of the United States Army, the USAF was formed as a separate branch of the military on September 18, 1947.
..... Click the link for more information.
..... Click the link for more information.
International Business Machines Corporation
Public (NYSE: IBM )
Founded 1889, incorporated 1911
Headquarters Armonk, New York, USA
Key people Samuel J.
..... Click the link for more information.
Public (NYSE: IBM )
Founded 1889, incorporated 1911
Headquarters Armonk, New York, USA
Key people Samuel J.
..... Click the link for more information.
United States Postal Service
Government agency
Founded 1776
Headquarters Washington, D.C.
Key people John E. Potter, Postmaster General
Industry Courier
Products First Class mail, Domestic Mail, Logistics
Revenue $72.
..... Click the link for more information.
Government agency
Founded 1776
Headquarters Washington, D.C.
Key people John E. Potter, Postmaster General
Industry Courier
Products First Class mail, Domestic Mail, Logistics
Revenue $72.
..... Click the link for more information.
Jacob Rabinow (1910 - 1999) was an engineer who led a truly prolific career as an inventor. He earned a total of 230 U.S. patents on a variety of mechanical, optical and electrical devices.
Rabinow was born in Kharkov, Ukraine, in 1910.
..... Click the link for more information.
Rabinow was born in Kharkov, Ukraine, in 1910.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus