Information about Obfuscated Code

Obfuscated code is source code that is (usually intentionally) very hard to read and understand. Some languages are more prone to obfuscation than others. C, C++ and Perl are most often cited as easily obfuscatable languages. Macro preprocessors are often used to create hard to read code by masking the standard language syntax and grammar from the main body of code. The term shrouded code has also been used.

There are also programs known as obfuscators that may operate on source code, object code, or both, for the purpose of deterring reverse engineering.

Obfuscating code to prevent reverse engineering is typically done to manage risks that stem from unauthorized access to source code. These risks include loss of intellectual property, ease of probing for application vulnerabilities and loss of revenue that can result when applications are reverse engineered, modified to circumvent metering or usage control and then recompiled. Obfuscating code is, therefore, also a compensating control to manage these risks. The risk is greater in computing environments such as Java and Microsoft's .NET which take advantage of just-in-time (JIT) compilation technology that allow developers to deploy an application as intermediate code rather than code which has been compiled into machine language before being deployed.

Another use of obfuscating code involves reducing the size of Java class files. When building MIDlets using Java ME, an obfuscator can be used to reduce the size of the JAR file.

Recreational obfuscation

Code is sometimes obfuscated deliberately for recreational purposes. There are programming contests which reward the most creatively obfuscated code: the International Obfuscated C Code Contest, Obfuscated Perl Contest, International Obfuscated Ruby Code Contest and Obfuscated PostScript Contest.

There are many varieties of interesting obfuscations ranging from simple keyword substitution, use/non-use of whitespace to create artistic effects, clever self-generating or heavily compressed programs, or programs that are valid and operate similarly in multiple programming languages.

Short obfuscated Perl programs printing "Just another Perl hacker" or something similar are often found in signatures of Perl programmers. These are informally known as JAPHs, and the origin of this practice is generally credited to Randal Schwartz.

Examples

Take this infamous example from Internet lore:
>
#include 
main(t,_,a)char *a;{return!0
Although unintelligible at first glance, it is a legal C program that when compiled and run will generate the 12 verses of The 12 Days of Christmas. It actually contains all the strings required for the poem in an encoded form inlined in the code. The code then iterates through the 12 days displaying what it needs to.


Another example is a program's source listing that was formatted to resemble an empty tic-tac-toe board. Each pass through the program modified the sourcecode to show a turn in the game, to be executed for the next move.

Yet another example is this short program that generates mazes of arbitrary length:
>
char*M,A,Z,E=40,J[40],T[40];main(C){for(*J=A=scanf(M="%d",&C);
--            E;             J[              E]             =T
[E   ]=  E)   printf("._");  for(;(A-=Z=!Z)  ||  (printf("\n|"
)    ,   A    =              39              ,C             --
)    ;   Z    ||    printf   (M   ))M[Z]=Z[A-(E   =A[J-Z])&&!C
&    A   ==             T[                                  A]
|6<<27
Note the shape of the corridors in the program. Modern C compilers don't allow constant strings to be overwritten,
which can be avoided by changing the first line to
>
char M[2],A,Z,E=40,J[40],T[40];main(C){for(*J=A=scanf("%d",&C);


Versions of gcc (the GNU Compiler for C) prior to 4.0 can compile the program in its original form if the -fwritable-strings flag is used.

A famous example for Perl is

>
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print


This slowly displays the text "Just another Perl / Unix hacker", multiple characters at a time, with delays. An explanation can be found here.

Some Python examples can be found in the official Python programming FAQ.

Yet another famous C example is:

>
_(__,___,____){___/__<=1?_(__,___+1,____):!(___%__)?_(__,___+1,0):___%__==___/
__&&!____?(printf("%d\t",___/__),_(__,___+1,0)):___%__>1&&___%__<___/__?_(__,1+
___,____+!(___/__%(___%__))):___<__*__?_(__,___+1,____):0;}main(){_(100,0,0);}


This program prints out the prime numbers less than 100. While this program is extremely difficult to follow, it is the result of several simple transformations from the following elementary program:

>
void primes(int cap) {
  int i, j, composite;
  for(i = 2; i < cap; i++) {
    composite = 0;
    for(j = 2; j < i; j++) 
      composite += !(i % j);
    if(!composite)
      printf("%d\t", i);
  }
}
int main() { 
  primes(100);
}


The first set of transformations aim to reduce the primes function into a single statement: (1) the meat of the function is rewritten as a single while loop with a sequence of cascading if-else structures inside of it; (2) the while loop is then rewritten as recursion; (3) the cascading if-else statements is rewritten as a single conditional statement. These transformations serve to obfuscate the code considerably. The remaining transformations purposefully hide the details of the function by: (4) removing all intermediate variables; (5) renaming the few variables left to _, ___, etc.; and, finally, (6) removing all whitespace and unnecessary parentheses.

The first transformation takes advantage of the well-known property that any simple function can be transformed into a single while loop followed by a series of cascading if-else statements. The program above is first transformed by rewriting the primes function in this form:

>
void primes(int cap) { 
  int i, j, composite, t = 0;
  while(t < cap * cap) {
    i = t / cap;
    j = t++ % cap;
    if(i <= 1);
    else if(j == 0)
      composite = 0;
    else if(j == i && !composite)
      printf("%d\t",i);
    else if(j > 1 && j < i)
      composite += !(i % j);  
  }
}
int main() {
  primes(100);
}


Another well-known property is that any while loop can be replaced by a series of recursive calls. The program is further transformed by replacing the while loop with recursion. Note that this requires adding two parameters to the primes function's header. Also note the consolidation of two statements into a list in the third if block. Finally note that one additional if-else block must be added to capture the situation in the non-recursive version where program flow fails all of the if-conditions and the while loop would simply continue with t incremented by one:

>
void primes(int cap, int t, int composite) {
  int i,j;
  i = t / cap;
  j = t % cap;
  if(i <= 1)
    primes(cap,t+1,composite);
  else if(j == 0)
    primes(cap,t+1,0);
  else if(j == i && !composite)
    (printf("%d\t",i), primes(cap,t+1,0));
  else if(j > 1 && j < i)
    primes(cap,t+1, composite + !(i % j));
  else if(t < cap * cap)
    primes(cap,t+1,composite);
}
int main() {
  primes(100,0,0);
}


The program is next transformed by changing the variable names to single letters and replacing the if-else structure with conditional statements (e.g. if(A) B else if(C) D else E becomes A ? B : C ? D : E):

>
void primes(int m, int t, int c) {
  int i,j;
  i = t / m;
  j = t % m;
  (i <= 1) ? primes(m,t+1,c) : (j == 0) ? primes(m,t+1,0) : (j == i && !c) ? 
  (printf("%d\t",i), primes(m,t+1,0)) : (j > 1 && j < i) ? 
  primes(m,t+1,c + !(i % j)) : (t < m * m) ? primes(m,t+1,c) : 0;
}
int main() {
  primes(100,0,0);
}


Next the program is transformed by removing the intermediate variables i and j and replacing them with (t / m) and (t % m), respectively:

>
void primes(int m, int t, int c) {
  ((t / m) <= 1) ? primes(m,t+1,c) : !(t % m) ? primes(m,t+1,0) : 
  ((t % m)==(t / m) && !c) ? (printf("%d\t",(t / m)), primes(m,t+1,0)) : 
  ((t % m)> 1 && (t % m) < (t / m)) ? primes(m,t+1,c + !((t / m) % (t % m))) : 
  (t < m * m) ? primes(m,t+1,c) : 0;
}
int main() {
  primes(100,0,0); 
}


Next the program is transformed by renaming the function primes and the variables m, t, and c to _, __, ___, and ____, respectively:

>
void _(int __, int ___, int ____) {
  ((___ / __) <= 1) ? _(__,___+1,____) : !(___ % __) ? _(__,___+1,0) : 
  ((___ % __)==(___ / __) && !____) ? (printf("%d\t",(___ / __)), 
  _(__,___+1,0)) : ((___ % __) > 1 && (___ % __) < (___ / __)) ? 
  _(__,___+1,____ + !((___ / __) % (___ % __))) : (___ < __ * __) ? 
  _(__,___+1,____) : 0;
} 
int main() {
  _(100,0,0); 
}


Finally, the program is transformed by removing white space, type declarations, and unambiguous parentheses to come up with the fully obfuscated version:

>
_(__,___,____){___/__<=1?_(__,___+1,____):!(___%__)?_(__,___+1,0):___%__==___/
__&&!____?(printf("%d\t",___/__),_(__,___+1,0)):___%__>1&&___%__<___/__?_(__,1+
___,____+!(___/__%(___%__))):___<__*__?_(__,___+1,____):0;}main(){_(100,0,0);}


This program runs on most systems. The limiting factor generally will be stack overflow because setting the equivalent of the cap parameter to 100 generally results in 100^2 recursive calls. This sort of obfuscation by program transformation is relatively easy to apply and can be performed on many simple programs.

Obfuscation by code morphing

Obfuscation by code morphing refers to obfuscating machine language or object code rather than obfuscating the source code.

This is achieved by completely replacing a section of the compiled code with an entirely new block that expects the same machine state when it begins execution as the previous section, and will leave with the same machine state after execution as the original. However, a number of additional operations will be completed as well as some operations with an equivalent effect.

Code morphing makes disassembly of a distributed program more difficult. However, by adding unnecessarily complicated operations and hindering compiler-made optimizations, the execution time of the program is increased. For that reason, code morphing should be limited to critical portions of a program and not be used on an entire application.

Code morphing is often used in obfuscating the copy protection or other checks that a program makes to determine whether it is a valid, authentic installation, or a pirated copy.

Obfuscation in malicious software

Spammers frequently use obfuscated JavaScript or HTML code in spam messages. The obfuscated message, when displayed by an HTML-capable e-mail client, appears as a reasonably normal message -- albeit with obnoxious JavaScript behaviors such as spawning pop-up windows. However, when the source is viewed, the obfuscations make it far more difficult for investigators to discern where the links go, or what the JavaScript code does. For this same reason JavaScript obfuscation is also often used by malware authors to conceal parts of code that run browser exploits, or that redirect to pages containing exploits.

Automated JavaScript obfuscators are currently available on the market. Although the effectiveness of these tools may vary, and the channels through which they are exchanged range from the legal to the manifestly malicious, they are all created for the purpose of confounding casual viewers or inexperienced investigators.

The techniques use JavaScript's dynamic nature -- a piece of code is stored as an encrypted string, which is decrypted and evaluated. This may be done several times. Other techniques include insertion of dummy code, as well as dummy HTML links to legitimate pages.

Disadvantages of obfuscation

When used alone

No obfuscator known today provides any guarantees on the difficulty of reverse engineering, and this seems to be an inherent issue (see for example, "Can We Obfuscate Programs?" by Boaz Barak). Thus, obfuscators do not provide security of a level similar to modern encryption schemes, and should be used with other measures in tandem, in cases where security is of high importance.

Additionally, the most common software reverse engineering attacks target copy protection schemes. These schemes generally rely heavily on existing operating system procedure calls, making basic code obfuscation easily bypassed using the same tools used with unobfuscated code.

Debugging

Obfuscated code is extremely difficult to debug. Variable names will no longer make sense, and the structure of the code itself will likely be modified beyond recognition. This fact generally forces developers to maintain two builds: One with the original, unobfuscated source code that can be easily debugged, and another for release. While both builds should be tested to make sure they perform identically, the second build is generally reliably constructed from the first by an obfuscator.

Obviously this limitation does not apply to intermediate language (Java, C#, etc.) obfuscators, which generally work on compiled assemblies rather than on source code.

Portability

Obfuscated code often depends on the particular characteristics of the platform and compiler, making it difficult to manage if either change.

Conflicts with Reflection APIs

Reflection is a set of APIs in various languages that allow an object to be examined or created just by knowing its classname at run-time. Many obfuscators allow specified classes to be exempt from renaming; and it is also possible to let a class be renamed and call it by its new name. However, the former option places limits on the dynamism of code, while the latter adds a great deal of complexity and inconvenience to the system.

Obfuscating software

A vast variety of tools exists to perform or assist with code obfuscation. These include experimental research tools created by academics, hobbyist tools, commercial products written by professionals, and Open-source software. There are also deobfuscators, which do the reverse.

Software obfuscation tools include specialized obfuscators to demonstrate a relatively limited technique, more general obfuscators which attempt a more thorough obfuscation, and combined-function tools which obfuscate code as part of a larger goal such as software licensing enforcement.

Explanation

Programs written in languages such as C++ or Pascal are compiled into the machine language of a given computer before they become a program. Programmers write "source code" while computers run "machine code" so this conversion is necessary. There is (generally) a one way transformation from source code to machine code. Machine code is not encrypted and is easy for anyone to see, but the format is so tedious for humans that reverse-engineering efforts are slow and painful.

Java and .NET languages (C#, Visual Basic, etc) take a different approach to compilation. They are far easier to reverse engineer because they do not compile to machine code, they compile into intermediate code.

Microsoft recommends using the Script Encoder to obfuscate the ASP files, so in case the web server is compromised, the hacker (or cracker) would be unable to find out how your ASP applications work. The Script Encoder works also on JScript and VBScript files. Note that the encoded JScript is only functional in Internet Explorer. However, in the documentation, it states that "Note that this encoding only prevents casual viewing of your code; it will not prevent the determined hacker from seeing what you've done and how.". In fact, it is indeed possible, as shown by the Windows Script Decoder, created on August 1, 2000 by Mr. Brownstone.

The Code Morphing is multilevel technology containing hundreds of unique code transformation patterns. In addition this technology includes the special layer that transforms some commands into Virtual Machine commands (like P-Code). Code Morphing turns binary code into an undecipherable mess that is not similar to normal compiled code, and completely hides execution logic of the protected code.

See also

External links

source code (commonly just source or code) is any sequence of statements and/or declarations written in some human-readable computer programming language.
..... Click the link for more information.
C

The C Programming Language, Brian Kernighan and Dennis Ritchie, the original edition that served for many years as an informal specification of the language.
..... Click the link for more information.
C++
Paradigm: Multi-paradigm
Appeared in: 1983
Designed by: Bjarne Stroustrup
Typing discipline: Static, unsafe, nominative
Major implementations: G++, Microsoft Visual C++, Borland C++ Builder
Dialects: ISO/IEC C++ 1998, ISO/IEC C++ 2003
..... Click the link for more information.
Perl

Paradigm: Multi-paradigm
Appeared in: 1987
Designed by: Larry Wall
Latest release: 5.8.8/ January 31 2006
Typing discipline: Dynamic
Influenced by: AWK, BASIC, BASIC-PLUS, C, C++, Lisp, Pascal, Python, sed, Unix shell
..... Click the link for more information.
In computer science, object code, or an object file, is the representation of code that a compiler or assembler generates by processing a source code file. Object files contain compact code, often called "binaries".
..... Click the link for more information.
Reverse engineering (RE) is the process of discovering the technological principles of a device or object or system through analysis of its structure, function and operation. It often involves taking something (e.g.
..... Click the link for more information.
JIT may refer to:
  • Various meanings of Just In Time:
  • Just-in-time compilation - a technique for improving the performance of Virtual Machines in computing.

..... Click the link for more information.
Bytecode is a binary representation of an executable program designed to be executed by a virtual machine rather than by dedicated hardware. Since it is processed by software, it is usually more abstract than machine code.
..... Click the link for more information.
A MIDlet is a Java program for embedded devices, more specifically the Java ME virtual machine. Generally, these are games and applications that run on a cell phone.

Mobile Information Device toolkit

A MIDlet requires a device that implements Java ME, MIDP to run.
..... Click the link for more information.
In computing, the Java Platform, Micro Edition or Java ME (previously known as Java 2 Platform, Micro Edition or J2ME) is a specification of a subset of the Java platform aimed at providing a certified collection of Java APIs for the development of software for
..... Click the link for more information.
Java Archive

File extension: .jar
MIME type: application/java-archive
Uniform Type Identifier: com.sun.java-archive
Developed by: Sun Microsystems
Type of format: file archive, data compression
..... Click the link for more information.
The International Obfuscated C Code Contest (abbr. IOCCC) is a programming contest for the most creatively obfuscated C code held annually since 1984 (with the exception of 1997, 1999, 2002, 2003, and 2006).
..... Click the link for more information.
The Obfuscated Perl contest was a competition for programmers of Perl which was held annually between 1996 and 2000.

Entrants to the competition aim to write "devious, inhuman, disgusting, amusing, amazing, and bizarre Perl code" [1] (link dead as of 24/04/2007)
..... Click the link for more information.
Perl

Paradigm: Multi-paradigm
Appeared in: 1987
Designed by: Larry Wall
Latest release: 5.8.8/ January 31 2006
Typing discipline: Dynamic
Influenced by: AWK, BASIC, BASIC-PLUS, C, C++, Lisp, Pascal, Python, sed, Unix shell
..... Click the link for more information.
Just another Perl hacker, or JAPH, typically refers to a Perl program which prints "Just another Perl hacker," (the comma is canonical but is occasionally omitted). Short JAPH programs are often used as signatures in online forums, or as T-shirt designs.
..... Click the link for more information.
signature (from Latin signare, "sign") is a handwritten (and sometimes stylized) depiction of someone's name, nickname or even a simple "X" that a person writes on documents as a proof of identity and intent. The writer of a signature is a signatory.
..... Click the link for more information.
Randal L. Schwartz (born November 22, 1961) is an American author, system administrator and programming consultant. Schwartz is the co-author of several widely used books about Perl, a programming language for computers, and has written regular columns about Perl for
..... Click the link for more information.
Lore may refer to:
  • Lore, that which is learned, a teaching. Learning of a special or out-of-the-way kind
  • Folklore, acquired knowledge or traditional beliefs
  • Lore, region between the eye and bill on the side of a bird's head; plural is lores, adjective is

..... Click the link for more information.
C

The C Programming Language, Brian Kernighan and Dennis Ritchie, the original edition that served for many years as an informal specification of the language.
..... Click the link for more information.
"The Twelve Days of Christmas" is an English Christmas carol which enumerates a series of grandiose gifts given on each of the twelve days of Christmas. It is a cumulative song, meaning that each verse is built on top of the previous verses.
..... Click the link for more information.
Tic-tac-toe, also called noughts and crosses, hugs and kisses, and many other names, is a pencil-and-paper game for two players, O and X, who take turns to mark the spaces in a 3×3 grid.
..... Click the link for more information.
The GNU Compiler Collection (usually shortened to GCC) is a set of compilers produced for various programming languages by the GNU Project. GCC is a key component of the GNU toolchain, and as well as being the official compiler of the GNU system, GCC has been adopted as the
..... Click the link for more information.
Python

Paradigm: Multi-paradigm
Appeared in: 1991
Designed by: Guido van Rossum
Developer: Python Software Foundation
Latest release: 2.5.1/ April 18 2007
Latest unstable release: 3.
..... Click the link for more information.
C

The C Programming Language, Brian Kernighan and Dennis Ritchie, the original edition that served for many years as an informal specification of the language.
..... Click the link for more information.
This article may contain improper references to .
The source in dispute is: article.
Please help [ improve this article] by removing . A self-published source may only be cited as a primary source in an article about the author or source itself and not as an authority.
..... Click the link for more information.
Machine code or machine language is a system of instructions and data directly executed by a computer's central processing unit. Machine code is the lowest-level of abstraction for representing a computer program.
..... Click the link for more information.
In computer science, object code, or an object file, is the representation of code that a compiler or assembler generates by processing a source code file. Object files contain compact code, often called "binaries".
..... Click the link for more information.
A disassembler is a computer program that translates machine language into assembly language — the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly language.
..... Click the link for more information.
compiler is a computer program (or set of programs) that translates text written in a computer language (the source language) into another computer language (the target language).
..... Click the link for more information.
Copy protection, also known as copy prevention or copy restriction, is a kind of hardware or storage media oriented method for technologically preventing unauthorized reproduction of copyrighted software, movies, music, and other media1.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter