How to Boost Speed of Queries in Stored Procedures in PostgreSQL?

Last Updated : 07 Feb, 2024

Usually, when we have large procedures in PostgreSQL we find them taking a lot of time (for eg. a task that would take normally 5 mins would go for more than 30 mins).

The Major factor here which causes is that when we have more updates and insert statements in our procedure and the amount of data inserted or updated is huge then Postgres is not able to keep track of the new information efficiently. Such can happen when we try to use the same tables that were affected previously again in future queries inside the procedure.

Query Optimization Techniques

The way to deal with this situation is to use

ANALYZE

For eg. Suppose we Insert 200,000 records into the student table in a query inside the procedure.

we can just use:

ANALYZE student;

This will make Postgres recalculate the execution plan for queries involving the table student henceforth. Thus, future queries would be efficiently executed.

The time taken by ANALYZE is very little which makes it useful.
It is suggested to ANALYZE after inserts or updates if the source table is being used anywhere later in the procedure.

Recalculate Indexes for Performance

Another way to make some queries faster outside the procedure (meaning in the server) is to “VACUUM” the tables involving them which removes dead data and recalculates the Indexes.

 VACUUM VERBOSE ANALYZE student;

Remember we cannot use VACUUM inside a stored procedure.

Example

CREATE TABLE IF NOT EXISTS employee (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    salary INT
);


CREATE OR REPLACE PROCEDURE process_employee_data()
AS $$
BEGIN
   
    INSERT INTO employee (name, salary)
    SELECT 'Employee' || generate_series(1, 100000), random() * 100000;
    ANALYZE employee;

  
    UPDATE employee SET salary = salary + 5000;
    ANALYZE employee;


    CREATE TABLE IF NOT EXISTS department (
        id SERIAL PRIMARY KEY,
        department_name VARCHAR(100)
    );

    INSERT INTO department (department_name) VALUES ('IT');
    ANALYZE department;

 
    SELECT e.id, e.name, e.salary, d.department_name
    FROM employee e
    JOIN department d ON e.id = d.id;

END;
$$ LANGUAGE plpgsql;


CALL process_employee_data();

In the above example, We

Create a table named employee with columns id, name, and salary. The id column is a serial (auto-incrementing) primary key.
Create a stored procedure named process_employee_data using the PL/pgSQL language.
Insert a large amount of data into the employee table. The generate_series function generates a series of numbers from 1 to 100,000, and random salaries are assigned to each employee.
Gather statistics for the employee table using ANALYZE, helping the query planner make informed decisions.
Increase the salary of all employees by 5000 using the UPDATE query.
Again gather updated statistics after the modification using ANALYZE.
Create a new table named department with columns id and department_name to join the employee table.
Insert data into the department, you might think why ANALYZE department after inserting only 1 record. The main point here is not insertion (since it is only 1 record) but the department table creation inside the procedure. We want to let Postgres know about the new table creation so that it can efficiently use it in its engine.
Finally, we use both tables in the select query where the Postgres engine will create an efficient execution plan since it already has knowledge of these tables being inserted and updated.
No need for ANALYZE after the SELECT query because no new data was inserted or modified.

Conclusion

Postgres doesn’t optimize its execution path based on new data inserted or updated in a procedure which leads to long query times.
Using ANALYZE makes Postgres recalculate query execution plan.
An outside procedure using VACUUM with ANALYZE is suggested which also recalculates indexes.
Most of the dead data remains in memory when the procedure fails due to some exception after inserting a huge amount of data which doesn’t get committed to the database due to exception.

Suggest improvement

How to Restrict Results to Top N Rows per Group in PostgreSQL?

Share your thoughts in the comments

How to Boost Speed of Queries in Stored Procedures in PostgreSQL?

Query Optimization Techniques

Recalculate Indexes for Performance

Example

Conclusion

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?