Open In App

Python Database Optimization with Psycopg2 and Multiprocessing

This article will provide a succinct overview of the fundamental utilization of the Psycopg2 module and the Multiprocessing module in Python. It will cover how Psycopg2 operates independently and in conjunction with multiprocessing.

Psycopg2 and Its Role in Database Operations

According to PyPi’s documentation for Psycopg2, it is designed to be used with a PostgreSQL database and fully implements Python’s DB API 2.0 specification. Notably, Psycopg2 is thread-safe, allowing the use of a single database connection across multiple threads instead of creating new connections for each thread. The official documentation also emphasizes that Psycopg2 is well-suited for heavily multi-threaded applications, as we will explore in the subsequent sections of this article. This will be a concise article covering the basic use of the Psycopg2 module and the Multiprocessing module for Python; how Psycopg2 works by itself and when used with multiprocessing.



Example: This Python script, utilizing psycopg2, connects to a PostgreSQL database and drops the “TestDB” database by executing a SQL query. It establishes a connection, creates a cursor for database operations, executes the query, and then closes the cursor and connection. In essence, it deletes a PostgreSQL database named “TestDB.”




# Import our psycopg2 module
import psycopg2
 
# Create a connection to our existing database
conn = psycopg2.connect(
  user='user', password='pass'
)
 
# Open a cursor to perform operations
cur = conn.cursor()
 
# Assemble a SQL query
query = 'DROP database "TestDB";'
 
# Execute our query
cur.execute(query)
 
# Close our connection and cursor
cur.close()
conn.close()

script.py



This Python script uses the psycopg2 module to connect to a PostgreSQL database, create a table named “test” with specified columns, insert data into the table, and commit the changes to the database. The code establishes a connection, opens a cursor for database operations, executes SQL commands for table creation and data insertion, and then closes the cursor and connection, ensuring the persistence of changes made to the PostgreSQL database.




import psycopg2
 
# Connect to an existing database
conn = psycopg2.connect("dbname=test user=postgres")
 
# Open a cursor to perform database operations
cur = conn.cursor()
 
# Execute a command: this creates a new table
cur.execute("CREATE TABLE test (id serial PRIMARY KEY, num integer, data text);")
 
# Pass data to fill a query placeholders and let Psycopg perform
# the correct conversion (no more SQL injections!)
cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)", (100, "abcdef"))
 
# Make the changes to the database persistent
conn.commit()
 
# Close communication with the database
cur.close()
conn.close()

Output :

Psycopg2 Operation with Multiprocessing

Using Psycopg2 with Multiprocessing involves a significant consideration of multi-threading and multiprocessing in database operations. Psycopg2, designed for heavily multi-threaded applications, optimizes cursor functionality for lightweight and easy creation and termination during execution. While Psycopg2 allows shared connections across threads due to its thread-safe nature, this differs in the case of multiprocessing. Unlike threads, Psycopg2 cannot share connections across processes, prompting the question of why employ multiprocessing when separate connections are necessary.

Why Opt for Multiprocess Database Operations?

Leveraging multiple processes reduces operation time significantly, especially for code handling thousands of operations per second. Operating across different sources is crucial, and when multiple clients send SQL queries, employing distinct processes aids in organization and troubleshooting. Pinpointing the responsible process is crucial for resolving issues, such as SQL injection. While more reasons exist for using multiprocessing in database operations, these are key considerations.

Varied Origins for Improved Organization

In applications concurrently serving multiple clients, utilizing separate processes for each client enhances organization and isolation. This proves especially beneficial in troubleshooting situations, aiding in pinpointing the source of issues arising from a particular client’s SQL query.

Example :This Python code uses psycopg2 to connect to a PostgreSQL database, create a table named “test,” and insert 128 entries. The loop iterates over a range, inserting values into the “test” table with the “num” column representing the iteration index and the “data” column as a string representation of the index. After committing the changes, the database connection is closed for persistence.




import psycopg2
 
# Connect to an existing database
conn = psycopg2.connect("dbname=test user=postgres")
 
# Open a cursor to perform database operations
cur = conn.cursor()
 
# Execute a command: this creates a new table
cur.execute("CREATE TABLE test (id serial PRIMARY KEY, num integer, data text);")
 
# Pass data to fill a query placeholders and let Psycopg perform
# the correct conversion (no more SQL injections!)
for i in range(128):
    cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)", (i, str(i)))
 
# Make the changes to the database persistent
conn.commit()
 
# Close communication with the database
cur.close()
conn.close()

scripts.py : This Python code uses multiprocessing to concurrently insert data into a PostgreSQL database. It defines a `worker` function for database operations within each process. The main block establishes a database connection, creates a table, and spawns four processes, each inserting a range of values into the “test” table using the `worker` function. The code ensures all processes complete their tasks before proceeding, improving efficiency through parallelized database insertion.




import multiprocessing
import os
import psycopg2
 
 
# Our worker function, each process calls to this function
def worker(start, end):
    conn = psycopg2.connect("dbname=test user=postgres password=admin")
    cur = conn.cursor()
    for i in range(start, end):
        cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)",
                    (i, str(chr(i))))
    cur.close()
    # Commit our database changes
    conn.commit()
 
    # Close our database connection
    conn.close()
 
 
if __name__ == '__main__':
    # Create a connection to our database
    conn = psycopg2.connect("dbname=test user=postgres password=admin")
 
    # Create a preliminary cursor to create a table in the database
    cur = conn.cursor()
 
    # Create a database table
    cur.execute("CREATE TABLE test (id serial PRIMARY KEY, num integer, data text);")
 
    # Close the cursor and connection
    cur.close()
    conn.commit()
    conn.close()
 
    # Create our 4 processes
    p1 = multiprocessing.Process(target=worker, args=[1, 32])
    p2 = multiprocessing.Process(target=worker, args=[32, 64])
    p3 = multiprocessing.Process(target=worker, args=[64, 96])
    p4 = multiprocessing.Process(target=worker, args=[96, 128])
 
    # Start all the processes
    p1.start()
    p2.start()
    p3.start()
    p4.start()
 
    # Wait until all processes finish
    p1.join()
    p2.join()
    p3.join()
    p4.join()

Output

Video Demonstration

Conclusion

Multiprocessing with Psycopg2 for automating database operations is extraordinarily useful when handling multiple clients accessing a database concurrently, and this allows a great deal of versatility to any python application that manages a database. However, there are some short comings; if your application is only utilizing a single connection to the database, multiprocessing is not as viable as other methods of parallelization. Overall, multiprocessing is versatile and a useful method for increasing the efficiency of database operations through psycopg2.


Article Tags :