Python | Making program run faster

Last Updated : 17 Sep, 2020

As we know, Python programming language is a bit slow and the target is to speed it up without the assistance of more extreme solutions, such as C extensions or a just-in-time (JIT) compiler.
While the first rule of optimization might be to “not do it”, the second rule is almost certainly “don’t optimize the unimportant.” To that end, if the program is running slow, one might start by profiling the code. More often than not, one finds that the program spends its time in a few hotspots, such as inner data processing loops. Once those locations are identified, the no-nonsense techniques can be used to make the program run faster.
A lot of programmers start using Python as a language for writing simple scripts. When writing scripts, it is easy to fall into a practice of simply writing code with very little structure.
Code #1: Taking this code into consideration.

Python3

# abc.py
import sys
import csv
 
with open(sys.argv[1]) as f:
    for row in csv.reader(f):
        # Some kind of processing

A little-known fact is that code defined in the global scope like this runs slower than code defined in a function. The speed difference has to do with the implementation of local versus global variables (operations involving locals are faster). So, simply put the scripting statements in a function to make the program run faster.

Code #2 :

Python3

# abc.py
import sys
import csv
 
def main(filename):
    with open(filename) as f:
        for row in csv.reader(f):
        # Some kind of processing
 
main(sys.argv[1])

The speed difference depends heavily on the processing being performed, but the speedups of 15-30% are not uncommon.

Selectively eliminate attribute access –

Every use of the dot (.) operator to access attributes comes with a cost. Under the covers, this triggers special methods, such as __getattribute__() and __getattr__(), which often lead to dictionary lookups.
One can often avoid attribute lookups by using the from module import name form of import as well as making selected use of bound methods as shown in the code fragment given below –
Code #3 :

Python3

import math
 
def compute_roots(nums):
    result = []
    for n in nums:
        result.append(math.sqrt(n))
    return result
 
# Test
nums = range(1000000)
for n in range(100):
    r = compute_roots(nums)

Output :

This program runs in about 40 seconds when running on the machine.

Code #4 : Change the compute_roots() function

Python3

from math import sqrt
 
def compute_roots(nums):
    result = []
    result_append = result.append
    for n in nums:
        result_append(sqrt(n))
    return result

Output :

This program runs in about 29 seconds when running on the machine.

The only difference between the two versions of code is the elimination of attribute access. Instead of using math.sqrt(), the code uses sqrt(). The result.append() method is additionally placed into a local variable re
sult_append and reused in the inner loop.
However, it must be emphasized that these changes only make sense in frequently executed code, such as loops. So, this optimization really only makes sense in carefully selected places.

Understand locality of variables –

As previously noted, local variables are faster than global variables. For frequently accessed names, speedups can be obtained by making those names as local as possible.
Code #5 : Modified version of the compute_roots() function

Python3

import math
 
def compute_roots(nums):
    sqrt = math.sqrt
    result = []
    result_append = result.append
    for n in nums:
        result_append(sqrt(n))
    return result

In this version, sqrt has been lifted from the math module and placed into a local variable. This code will run about 25 seconds (an improvement over the previous version, which took 29 seconds). That additional speedup is due to a local lookup of sqrt being a bit faster than a global lookup of sqrt.
Locality arguments also apply when working in classes. In general, looking up a value such as self.name will be considerably slower than accessing a local variable. In inner loops, it might pay to lift commonly accessed attributes into a local variable as shown in the code given below.
Code #6 :

Python3

# Slower
class SomeClass:
    ...
    def method(self):
        for x in s:
            op(self.value)
# Faster
class SomeClass:
    ...
    def method(self):
        value = self.value
        for x in s:
            op(value)

Dynamic Typing:

The reason Python is slow is because it’s dynamically typed now we’re going to talk about this more in detail but I want to give a comparison to a language like Java. Now in Java, everything is statically typed and this language is actually compiled before it runs, unlike Python that’s compiled at runtime through an interpreter. Now what happens in Java is when you write code, you need to define what type each of your variables is going to be, what type your methods and functions are going to be returning and you pretty much have to define exactly what everything’s going to be throughout your code. Now although this leads to much longer development times and takes a much longer time to write your code but what it does is increase efficiency when you are compiling, now the reason this actually works and the reason it works so much faster than Python code is because if you know the type that a specific variable or object is going to be, you can perform a ton of different optimizations and avoid performing a ton of different checks while you’re actually running the code because these checks are performed at compile time in Java essentially you can’t compile any Java code that hasn’t actual or even just like typed errors while you’re writing that code you are going to try to compile it and it would say like this type isn’t accurate, you can’t do this, you can’t compile it because it knows that when it comes to runtime that’s not going to work so essentially all of these checks that actually needs to be performed in Python when the code is running are performed beforehand and there’s just a ton of optimization done because of this statically typed length. Now one may ask a question like, Why doesn’t Python do this? Answer to this would be Python is dynamically typed which simply means that any variable can change its type and can change it’s value at any point in the program while it’s running which means that we can’t actually compile the entire program beforehand because we can’t do all of these checks at once because we don’t know what type these variables are going to be, they are going to change at runtime, different things are going to happen and because of that we can’t get all these optimization that we might have in a lower level language like Java, C or C++ and that is kind of the fundamental reason the language is slow, this dynamic typing and any fast language is going to have a compiler that’s going to run through, it’s going to make sure that everything is good, it’s going to do all these checks before it actually ends up running the code at runtime where what happens in Python is all of your code is actually compiled and checked at runtime so rather than compiling it before and taking all that time beforehand while you’re running the code , many different checks are happening to make sure that say this object is correct, these types are proper, everything is working the same.

Concurrency:

Now the next thing to talk about is obviously the lack of concurrency in Python. This is going to be the major kind of factor on speed, if you’re writing an application in Java, C, you can spread everything out throughout multiple threads which allows you to utilize all the cores of your CPU so to break this down in modern-day computing most of us have four core CPUs or higher and that allows us to actually run four tasks at the exact same time concurrently now with Python this isn’t possible. Python says, well for each interpreter we can have at most one thread running at a time and a thread is just some kind of operation that’s happening on the CPU core so that means even if we create many threads in our Python program we can only be using one CPU core while in a Java program or a C program could be using all eight or be using all four which will obviously lead to 4X or 8X increase in speed, now we can get around this in Python by using multiprocessing, but there are some issues with that.

Suggest improvement

How to make a Python program wait?

Share your thoughts in the comments

Python | Making program run faster

Python3

Python3

Selectively eliminate attribute access –

Python3

Python3

Understand locality of variables –

Python3

Python3

Dynamic Typing:

Concurrency:

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?