Open In App

What is Obfuscation?

Last Updated : 30 Jun, 2020
Improve
Improve
Like Article
Like
Save
Share
Report

Obfuscation is a well-known term in software engineering. It is the concealment of written code purposefully by the programmer. It is mainly done for the purposes of security by making it obscure to avoid tampering, hide implicit values or conceal the logic used. One can obfuscate code with the help of language-specific deobfuscators that convert into meaningful code.

For example:

  • Below is an obfuscated C code:




    int i;main(){for(i=0;i["]<i;++i){--i;}"];
    read('-'-'-',i+++"hell\
    o,world!\n",'/'/'/'));}read(j,i,p){
    write(j/p+p,i---j,i/i);} 

    
    

  • Here is the deobfuscated version which a person can understand.




    int i;
      
    void write_char(char ch)
    {
        printf("%c", ch);
    }
      
    int main()
    {
        for (i = 0; i < 15; i++) {
            write_char("hello, world!\n"[i]);
        }
        return 0;
    }

    
    

How to obfuscate code in apps?
To understand obfuscation, we need to know how Android and Java implement this in-app formation. There are two ways to obfuscate code in apps:

  1. Shrinking: It helps detect and safely remove unused classes, fields, methods, and attributes from the app’s release build.
  2. Optimization: It helps in inspecting and rewriting the code to reduce its size. For example, if an optimizer detects an if-else statement in which the else {} statement is never used, the code for the else statement is removed. Examples of code shrinkers and optimizers are ProGuard for both Java and Android and R8 for Android.

How to determine quality of an obfuscation method?
The quality of an obfuscation method is determined by the combination of its potency, resilience, stealth and cost.

  1. Stealth: It is necessary to hide the flow of control of a program.
  2. Cost: Cost-effectiveness is necessary so that an obfuscation technique can be applied on a large scale over several similar applications.
  3. Potency: Potency defines to what degree the transformed code is more obscure than the original. Software complexity metrics define various complexity measures for software, such as the number of predicates it contains, depth of its inheritance tree, nesting levels, etc. While the goal of good software design is to minimize complexity based on these parameters, the goal of obfuscation is to maximize it.
  4. Resilience: Resilience defines how well the transformed code can resist automated deobfuscation attacks. It is a combination of the programmer effort to create a deobfuscator and the time and space required by the deobfuscator. The highest degree of resilience is a one-way transformation that cannot be undone by a deobfuscator. An example is when the obfuscation removes information such as source code formatting.

Advantages of Obfuscation:

  • A famous method used for obfuscation is iterative code obfuscation. Used in many applications, iterative code obfuscation is a procedure where one or more obfuscation algorithms are repeatedly applied to code, with the output of the previous obfuscation algorithm providing the input to the next obfuscation algorithm. This can be called as a way to add layers of security to the code.
  • If a person is releasing valuable software (especially Java, Android, .NET and iOS) anywhere outside his or her immediate control and the source code is not distributed, obfuscation should probably be part of the application development process. Obfuscation makes it much more difficult for attackers to review the code and analyze the application. It also may make it hard for hackers to debug and tamper with your application. The end goal is to make it difficult to extract or discover useful information, such as trade secrets (IP), credentials, or security vulnerabilities from an application.

Disadvantages of Obfuscation:
Obfuscation is also used by cybercriminals. Let’s see how to protect ourselves from them.

  • Obfuscation is widely used by malware writers to evade antivirus scanners. It is essential to analyze how these obfuscation techniques are used in malware.
  • Dead-Code Insertion: This is a simple rudimentary technique that functions by adding ineffective instructions to a program to change its appearance, however, not altering its behaviour. To combat dead-code insertions, the signature-based antivirus scanners should be able to delete the ineffective instructions before analysis.
  • Instruction Subroutines: This kind of obfuscation technique makes sure the original code evolves by replacing some instructions with other equivalents to the original instructions ones.
  • Code Transportation: Code transposition employs a reordering of sequences of the instruction of an original code without having any visible impact on the code’s behaviour. Essentially, there are two methods to deploy this technique into action. The first method is randomly shuffling the instructions, proceeding on to recovering the original execution order by inserting the unconditional branches or jumps. A way to combat this type of obfuscation is to restore the original program by removing the unconditional branches or jumps. In comparison, the second method creates new generations by choosing and reordering the free instructions which have no impact on one another. It is a sophisticated and complex problem to find free instructions. This method is hard to implement and it can also make the cost of detection high.
  • Code Integration: Code integration was firstly introduced by the Win95/Zmist malware also known as Zmist. The Zmist malware binds itself to the code of its target program. In order to execute this technique of obfuscation, Zmist must firstly decompile its target program into small manageable objects, and slot itself between them, proceeding on to reassembling the integrated code into a new generation. By far this is one of the most sophisticated obfuscation techniques and can make detection and recovery very difficult, as well as costly.


Like Article
Suggest improvement
Share your thoughts in the comments