Information about Awk (programming Language)
“AWK” redirects here. For other uses, see AWK (disambiguation).
| Paradigm: | scripting language, procedural, event-driven |
|---|---|
| Appeared in: | 1977, last revised 1985, current POSIX edition is IEEE Std 1003.1-2004 |
| Designed by: | Alfred Aho, Peter Weinberger, and Brian Kernighan |
| Typing discipline: | none; can handle strings, integers and floating point numbers; regular expressions |
| Major implementations: | awk, GNU Awk, mawk, nawk, MKS AWK, Thompson AWK (compiler), Awka (compiler) |
| Dialects: | old awk oawk 1977, new awk nawk 1985, GNU Awk |
| Influenced by: | C, SNOBOL4, Bourne shell |
| Influenced: | Perl, Korn Shell (ksh93, dtksh, tksh), JavaScript |
| OS: | Cross-platform |
| Website: | [1] |
AWK is a general purpose programming language that is designed for processing text-based data, either in files or data streams. The name AWK is derived from the surnames of its authors — Alfred Aho, Peter Weinberger, and Brian Kernighan; however, it is not commonly pronounced as a string of separate letters but rather to sound the same as the name of the bird, auk (which acts as an emblem of the language such as on The AWK Programming Language book cover). awk, when written in all lowercase letters, refers to the Unix or Plan 9 program that runs other programs written in the AWK programming language.
AWK is an example of a programming language that extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. The power, terseness, and limitations of AWK programs and sed scripts inspired Larry Wall to write Perl. Because of their dense notation, all these languages are often used for writing one-liner programs.
AWK is one of the early tools to appear in Version 7 Unix and gained popularity as a way to add computational features to a Unix pipeline. A version of the AWK language is a standard feature of nearly every modern Unix-like operating system available today. AWK is mentioned in the Single UNIX Specification as one of the mandatory utilities of a Unix operating system. Besides the Bourne shell, AWK is the only other scripting language available in a standard Unix environment. Implementations of AWK exist as installed software for almost all other operating systems.
Structure of AWK programs
An AWK program is a series ofpattern { action }
pairs, where pattern is typically an expression and action is a series of commands. Each line of input is tested against all the patterns in turn and the action executed if the expression is true. Either the pattern or the action may be omitted. The pattern defaults to matching every line of input. The default action is to print the line of input.
In addition to a simple AWK expression, the pattern can be BEGIN or END causing the action to be executed before or after all lines of input have been read, or pattern1, pattern2 which matches the range of lines of input starting with a line that matches pattern1 up to and including the line that matches pattern2 before again trying to match against pattern1 on future lines.
In addition to normal arithmetic and logical operators, AWK expressions include the tilde operator, ~, which matches a regular expressions against a string. As a handy default, /regexp/ without using the tilde operator matches against the current line of input.
AWK commands
AWK commands are the statement that is substituted for action in the examples above. AWK commands can include function calls, variable assignments, calculations, or any combination thereof. AWK contains built-in support for many functions; many more are provided by the various flavors of AWK. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.For brevity, the enclosing curly braces ( { } ) will be omitted from these examples.
The print command
The print command is used to output text. The output text is always terminated with a predefined string called the output record separator (ORS) whose default value is a newline. The simplest form of this command is:This displays the contents of the current line. In AWK, lines are broken down into fields, and these can be displayed separately:
- print $1
- Displays the first field of the current line
- print $1, $3
- Displays the first and third fields of the current line, separated by a predefined string called the output field separator (OFS) whose default value is a single space character
Although these fields ($X) may bear resemblance to variables (the $ symbol indicates variables in perl), they actually refer to the fields of the current line. A special case, $0, refers to the entire line. In fact, the commands "print" and "print $0" are identical in functionality.
The print command can also display the results of calculations and/or function calls:
print 3+2 print foobar(3) print foobar(variable) print sin(3-2)
Output may be sent to a file:
print "expression" > "file name"
Variables and Syntax
Variable names can use any of the characters [A-Za-z0-9_], with the exception of language keywords. The operators + - * / are addition, subtraction, multiplication, and division, respectively. For string concatenation, simply place two variables (or string constants) next to each other, optionally with a space in between. String constants are delimited by double quotes. Statements need not end with semicolons. Finally, comments can be added to programs by using # as the first character on a line.User-defined functions
In a format similar to C, function definitions consist of the keyword function, the function name, argument names and the function body. Here is an example of a function.function add_three (number, temp) { temp = number + 3 return temp }
This statement can be invoked as follows:
print add_three(36) # Outputs 39
Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some whitespace in the argument list before the local variables, in order to indicate where the parameters end and the local variables begin.
Sample applications
Hello World
Here is the ubiquitous "Hello world program" program written in AWK:BEGIN { print "Hello, world!" }
Print lines longer than 80 characters
Print all lines longer than 80 characters. Note that the default action is to print the current line.length > 80
Print a count of words
Count words in the input, and print lines, words, and characters (like wc){ w += NF c += length + 1 } END { print NR, w, c }
Sum last word
{ s += $NF } END { print s + 0 }As there is no pattern for the first line of the program, every line of input matches by default so the s += $NF action is executed. s is incremented by the numeric value of $NF which is the last word on the line as defined by AWK's field separator, by default white-space. NF is the number of fields in the current line, e.g. 4. Since $4 is the value of the fourth field, $NF is the value of the last field in the line regardless of how many fields this line has, or whether it has more or less fields than surrounding lines.
(If the line has no fields then NF is 0, $0 is the whole line, which in this case is empty apart from possible white-space, and so has the numeric value 0.)
At the end of the input the END pattern matches so s is printed. However, since there may have been no lines of input at all, and so s has never been assigned to, it will by default be an empty string. Adding zero to a variable is an AWK idiom for coercing it from a string to a numeric value. (Catenating an empty string is to coerce from a number to a string, e.g. s "". Note, there's no operator to catenate strings, they're just placed adjacently.) With the coercion the program prints 0 on an empty input, without it an empty line is printed.
Match a range of input lines
$ yes Wikipedia | cat -n | awk 'NR % 41, NR % 4
3' | head -7 1 Wikipedia 2 Wikipedia 3 Wikipedia 5 Wikipedia 6 Wikipedia 7 Wikipedia 9 Wikipedia $The yes and cat commands generate a series of numbered lines as example input. NR is the number of records, typically lines of input, AWK has so far read, i.e. the current line number, starting at 1 for the first line of input. % is the modulo operator. NR % 4 == 1 is true for the first, fifth, ninth, etc., lines of input. Likewise, NR % 4 == 3 is true for the third, seventh, eleventh, etc., lines of input. The range pattern is false until the first part matches, on line 1, and then remains true up to and including when the second part matches, on line 3. It then stays false until the first part matches again on line 5.
The first part of a range pattern being constantly true, e.g. 1, can be used to start the range at the beginning of input. Similarly, if the second part is constantly false, e.g. 0, the range continues until the end of input.
/^--cut here--$/, 0
Prints lines of input from the first line matching the regular expression ^--cut here--$ to the end.
Calculate word frequencies
Word frequency, (uses associative arrays)BEGIN { FS="[^a-zA-Z]+"}
{ for (i=1; i<=NF; i++) words[tolower($i)]++ }
END { for (i in words) print i, words[i] }
Self-contained AWK scripts
As with many other programming languages, self-contained AWK script can be constructed using the so-called "shebang" syntax.For example, a UNIX command called hello.awk that prints the string "Hello, world!" may be built by creating a file named hello.awk containing the following lines:
- !/usr/bin/awk -f
AWK versions and implementations
AWK was originally written in 1977, and distributed with Version 7 Unix.In 1985 its authors started expanding the language, most significantly by adding user-defined functions. The language is described in the book The AWK Programming Language, published 1988, and its implementation was made available in releases of UNIX System V. To avoid confusion with the incompatible older version, this version was sometimes known as "new awk" or nawk. This implementation was released under a free software license in 1996, and is still maintained by Brian Kernighan. (see external links below)
BWK awk refers to this the version by Brian W. Kernighan. It has been dubbed the "One True AWK" because of the use of the term in association with the book[1] that originally described the language, and the fact that Kernighan was one of the original authors of awk. FreeBSD refers to this version as one-true-awk[2].
gawk (GNU awk) is another free software implementation. It was written before the original implementation became freely available, and is still widely used. Many Linux distributions come with a recent version of gawk and gawk is widely recognized as the de-facto standard implementation in the Linux world; gawk version 3.0 was included as awk in FreeBSD prior to version 5.0. Subsequent versions of FreeBSD use BWK awk in order to avoid[3] the GPL, a more restrictive (in the sense that GPL licensed code cannot be modified to become proprietary software) license than the BSD license. [4]
xgawk is a SourceForge project[5] based on gawk. It extends gawk with dynamically loadable libraries.
mawk is a very fast AWK implementation by Mike Brennan based on a byte code interpreter. This is the default AWK that comes with Debian and Ubuntu.
awka (whose front end is written on top of the mawk program) is a translator of awk scripts into C code. When compiled, statically including the author's libawka.a, the resulting executables are considerably sped up and according to the author's tests compare very well with other versions of awk, perl or tcl. Small scripts will turn into programs of 160-170 kB. [2]
Downloads and further information about these versions are available from the sites listed below.
Thompson AWK or TAWK is an AWK compiler for DOS and Windows, previously sold by Thompson Automation Software (which has ceased its activities).
Jawk is a SourceForge project[6] to implement AWK in Java. Extensions to the language are added to provide access to Java features within AWK scripts (i.e., Java threads, sockets, Collections, etc).
BusyBox includes a sparsely documented Awk implementation that appears to be complete, written by Dmitry Zakharov. This implementation is the smallest Awk implementation out there, suitable for embedded systems.
Books
- Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger (1988). The AWK Programming Language. Addison-Wesley. ISBN 0-201-07981-X. The book's webpage includes downloads of the current implementation of Awk and links to others.
- Arnold Robbins. Effective awk Programming, Edition 3. Arnold Robbins maintained the GNU Awk implementation of AWK for more than 10 years. The free GNU Awk manual was also published by O'Reilly in May 2001. Free download of this manual is possible through the following book references.
- Arnold Robbins. GAWK: Effective AWK Programming: A User's Guide for GNU Awk, Edition 3.
- Dale Dougherty, Arnold Robbins (March 1997). sed & awk, Second Edition, Second Edition, O'Reilly Media. ISBN 1-56592-225-5.
See also
References
1. ^ The AWK Programming Language, ISBN 0-201-07981-X.
2. ^ FreeBSD's work log for importing BWK awk into FreeBSD's core, dated 2005-05-16, downloaded 2006-09-20
3. ^ FreeBSD's view of GPL Advantages and Disadvantages
4. ^ FreeBSD 5.0 release notes with notice of BWK awk in the base distribution
5. ^ xgawk at SourceForge
6. ^ Jawk at SourceForge
2. ^ FreeBSD's work log for importing BWK awk into FreeBSD's core, dated 2005-05-16, downloaded 2006-09-20
3. ^ FreeBSD's view of GPL Advantages and Disadvantages
4. ^ FreeBSD 5.0 release notes with notice of BWK awk in the base distribution
5. ^ xgawk at SourceForge
6. ^ Jawk at SourceForge
External links
- : pattern scanning and processing language – Commands & Utilities Reference, The Single UNIX Specification, Issue 6 from The Open Group
- awk maintained by Brian Kernighan.
- [news:comp.lang.awk comp.lang.awk] is a USENET newsgroup dedicated to AWK.
- GAWK (GNU awk) webpage
- mawk download site
- DJGPP port of Gawk 3.11b as a downloadable 768KB zipfile
- xgawk download site
- Awka Open Source, AWK to C Conversion Tool
- TAWK Compiler
- Jawk Open Source, an implementation of AWK in Java with extensions
- gnulamp awk tutorial
- AWK annoyances; This page includes a Linux port of the MKS version of AWK.
Unix command line programs and builtins (more) | |
|---|---|
| File and file system management | cat chattr cd chmod chown chgrp cksum cmp cp du df file fsck fuser ln ls lsof mkdir mount mv pwd rm rmdir split touch |
| Process management | at chroot crontab exit kill killall nice pgrep pidof pkill ps sleep time top wait watch |
| User Management/Environment | env finger id logname mesg passwd su sudo uname uptime w wall who whoami write |
| Text processing | awk comm cut ed ex fmt head iconv join less more paste sed sort tac tail tr uniq wc xargs |
| Shell programming | basename echo expr false printf test true unset |
| Printing: lp Communications: inetd netstat ping rlogin nc traceroute Searching: find grep strings Miscellaneous: banner bc cal dd man size yes | |
AWK may refer to
..... Click the link for more information.
- AWK (programming language) (after names of its inventors, Aho, Weinberger, Kernighan)
- The National Rail code for Adwick railway station, United Kingdom
- Andrew W.K., a pop/rock musician
See also
- Auk
..... Click the link for more information.
A programming paradigm is a fundamental style of programming regarding how solutions to problems are to be formulated in a programming language. (Compare with a methodology, which is a style of solving specific software engineering problems).
..... Click the link for more information.
..... Click the link for more information.
- Scripting redirects here. For other uses, see script.
Scripting languages (commonly called script languages) are computer programming languages that are typically interpreted and can be typed directly from a keyboard.
..... Click the link for more information.
Procedural programming is sometimes used as a synonym for imperative programming (specifying the steps the program must take to reach the desired state), but can also refer (as in this article) to a programming paradigm based upon the concept of the procedure call.
..... Click the link for more information.
..... Click the link for more information.
Event-driven programming or event-based programming is a computer programming paradigm in which the flow of the program is determined by user actions (mouse clicks, key presses) or messages from other programs.
..... Click the link for more information.
..... Click the link for more information.
Alfred V. Aho is a computer scientist. He is the Lawrence Gussman Professor of Computer Science at Columbia University, where he is also vice-chair of undergraduate education for the computer science department.
..... Click the link for more information.
..... Click the link for more information.
Peter J. Weinberger is a computer scientist who works at Google. He worked at AT&T Bell Labs and contributed to the design of the pioneering AWK programming language (he is the "W" in AWK).
..... Click the link for more information.
..... Click the link for more information.
Brian Wilson Kernighan (IPA pronunciation: ['kɛrnɪˌhæn], the 'g' is silent), (born 1942 in Toronto, Ontario, Canada) is a computer scientist who worked at Bell Labs alongside Unix creators Ken Thompson and Dennis
..... Click the link for more information.
..... Click the link for more information.
In computer science, a type system defines how a programming language classifies values and expressions into types, how it can manipulate those types and how they interact.
..... Click the link for more information.
..... Click the link for more information.
Implementation is the realization of an application, or execution of a plan, idea, model, design, specification, standard, algorithm, or policy.
In computer science, an implementation is a realization of a technical specification or algorithm as a program, software
..... Click the link for more information.
In computer science, an implementation is a realization of a technical specification or algorithm as a program, software
..... Click the link for more information.
A dialect of a programming language is a (relatively small) variation or extension of the language that does not change its intrinsic nature. With languages such as Scheme and Forth, standards may be considered insufficient, inadequate or even illegitimate by implementors, so often
..... Click the link for more information.
..... Click the link for more information.
C
The C Programming Language, Brian Kernighan and Dennis Ritchie, the original edition that served for many years as an informal specification of the language.
..... Click the link for more information.
The C Programming Language, Brian Kernighan and Dennis Ritchie, the original edition that served for many years as an informal specification of the language.
..... Click the link for more information.
SNOBOL
Paradigm: multi-paradigm: object-oriented, functional, logic
Appeared in: 1962
Designed by: David J. Farber, Ralph E. Griswold and Ivan P. Polonsky
Developer: David J. Farber, Ralph E. Griswold, Ivan P.
..... Click the link for more information.
Paradigm: multi-paradigm: object-oriented, functional, logic
Appeared in: 1962
Designed by: David J. Farber, Ralph E. Griswold and Ivan P. Polonsky
Developer: David J. Farber, Ralph E. Griswold, Ivan P.
..... Click the link for more information.
The Bourne shell, or sh, was the default Unix shell of Unix Version 7, and replaced the Thompson shell, whose executable file had the same name, sh. It was developed by Stephen Bourne, of AT&T Bell Laboratories, and was released in 1977 in the Version 7 Unix release
..... Click the link for more information.
..... Click the link for more information.
Perl
Paradigm: Multi-paradigm
Appeared in: 1987
Designed by: Larry Wall
Latest release: 5.8.8/ January 31 2006
Typing discipline: Dynamic
Influenced by: AWK, BASIC, BASIC-PLUS, C, C++, Lisp, Pascal, Python, sed, Unix shell
..... Click the link for more information.
Paradigm: Multi-paradigm
Appeared in: 1987
Designed by: Larry Wall
Latest release: 5.8.8/ January 31 2006
Typing discipline: Dynamic
Influenced by: AWK, BASIC, BASIC-PLUS, C, C++, Lisp, Pascal, Python, sed, Unix shell
..... Click the link for more information.
The Korn shell (ksh) is a Unix shell which was developed by David Korn (AT&T Bell Laboratories) in the early 1980s. It is backwards compatible with the Bourne shell and includes many features of the C shell as well, such as a command history, which was inspired by the
..... Click the link for more information.
..... Click the link for more information.
An operating system (OS) is the software that manages the sharing of the resources of a computer. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the
..... Click the link for more information.
..... Click the link for more information.
Cross-platform is a term which can refer to computer programs, operating systems, computer languages, programming languages, or other computer software and their implementations which can be made to work on multiple computer platforms.
..... Click the link for more information.
..... Click the link for more information.
A website (alternatively, Web site or web site) is a collection of Web pages, images, videos or other digital assets that is hosted on one or several Web server(s), usually accessible via the Internet, cell phone or a LAN.
..... Click the link for more information.
..... Click the link for more information.
A programming language is an artificial language that can be used to control the behavior of a machine, particularly a computer. Programming languages, like natural languagess, are defined by syntactic and semantic rules which describe their structure and meaning respectively.
..... Click the link for more information.
..... Click the link for more information.
A family name, surname, last name, patronymic, or metronymic, is the part of a person's name indicating the family to which the person belongs. The use of family names is currently widespread in cultures around the world.
..... Click the link for more information.
..... Click the link for more information.
Alfred V. Aho is a computer scientist. He is the Lawrence Gussman Professor of Computer Science at Columbia University, where he is also vice-chair of undergraduate education for the computer science department.
..... Click the link for more information.
..... Click the link for more information.
Peter J. Weinberger is a computer scientist who works at Google. He worked at AT&T Bell Labs and contributed to the design of the pioneering AWK programming language (he is the "W" in AWK).
..... Click the link for more information.
..... Click the link for more information.
Brian Wilson Kernighan (IPA pronunciation: ['kɛrnɪˌhæn], the 'g' is silent), (born 1942 in Toronto, Ontario, Canada) is a computer scientist who worked at Bell Labs alongside Unix creators Ken Thompson and Dennis
..... Click the link for more information.
..... Click the link for more information.
AUK is a three-letter abbreviation with multiple meanings, as described below:
..... Click the link for more information.
- Audax UK, a cycling organisation
- American University in Kosovo, part of the Rochester Institute of Technology
- alt.usenet.
..... Click the link for more information.
Unix (officially trademarked as UNIX®) is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs including Ken Thompson, Dennis Ritchie and Douglas McIlroy.
..... Click the link for more information.
..... Click the link for more information.
Plan 9 from Bell Labs is a distributed operating system, primarily used as a research vehicle. It was developed as the research successor to Unix by the Computing Sciences Research Center at Bell Labs between the mid-1980s and 2002.
..... Click the link for more information.
..... Click the link for more information.
A programming language is an artificial language that can be used to control the behavior of a machine, particularly a computer. Programming languages, like natural languagess, are defined by syntactic and semantic rules which describe their structure and meaning respectively.
..... Click the link for more information.
..... Click the link for more information.
string is an ordered sequence of symbols. These symbols are chosen from a predetermined set.
In programming, when stored in memory each symbol is represented using a numeric value.
..... Click the link for more information.
In programming, when stored in memory each symbol is represented using a numeric value.
..... Click the link for more information.
In programming languages a data type defines a set of values and the allowable operations on those values[1]. For example, in the Java programming language, the "int" type represents the set of 32-bit integers ranging in value from -2,147,483,648 to 2,147,483,647, and
..... Click the link for more information.
..... Click the link for more information.
This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus