PostgreSQL – Backup Database
All commercial firms and corporations are never 100% free of data threats. Being wary of scenarios like data corruption, host or network failures or even deliberate storage facility destruction is extremely important. Therefore, backing up data is a critical activity that every firm depends on to ensure its existence in the market. The idea behind backups of servers extends beyond just the ability to contend in the market. Here are a few reasons why backups are done so often:
- Data-loss prevention: In cases of dire situations such as cyber threats or catastrophes, data must be kept alive in its authentic dress.
- Business Health: all businesses must review their records in order to stay reliant and aware on all levels of database security.
- Foster responsibility: backing up servers is the most fluent manner by which a firm gathers trust from the consumers regarding the safekeeping of information.
PostgreSQL has a noteworthy share in the market, setting aside MySQL as the undisputed leader. As is customary for any Database Management System of such a high standard, PostgreSQL provides a set of both logical and physical backups. For those not aware of the difference:
- Logical backups contain info regarding the database, such as the tables and schemas while physical backups contain the files and directories of the information.
- Logical backups are less size when compared to physical backups.
Conceptually, there are several levels of backup (differential, full, incremental etc.) but that is besides the objective of this article. We shall be dealing with the approaches that PostgreSQL offers to its users. Here we shall focus on 3 of them –
- File System Level Backups
- SQL Dumps
- Continuous or Repetitive Archiving
File System Level Backup
A file system level backup is a physical backup. The idea is to make a copy of all files that created the database. This means we are looking at database clusters and directories that are used by PostgreSQL to write the data into the database. One way of interpreting this is by visualizing yourself as the scientist publishing a research paper and using loads of previously written work to give definition to your work. The works you refer to form a framework and layout which guide you throughout your writing process. The files that PostgreSQL attempts to copy are the framework and layout along with guidelines containing the data (specifically a bunch of .dat files).
The copying statement:
tar -cf backup.tar /%p $PATH TO DATA FILE/%f $FILENAME OR rsync -v /var/lib/pgsql/13/backups/ /Directory 2/
Notice that a .tar file is created i.e., a tape archive. Meanwhile, rsync is a command that SYNCHRONIZES two directories. So, if Directory 2 does not have the same contents as the first directory (/var/lib/….), then it would after execution – a cleaver form of copying.
However, we must enable archiving prior to execution. Obviously, simply copying data is considered as being careless; ongoing transactions will affect the immediate state of files being copied, leading to a questionable status of the backup and its reliability.
Note: This is a deliberate overlap with the last approach discussed in this article; it helps with improving the quality of the backup.
Therefore, we begin by first enabling WAL archiving. WAL archives are files that maintain all modifications made by transactions in order to keep data in the database and the instance memory consistent.
WAL files are ‘Write-Ahead Log’ files. All transactions performed are first written to the WAL file before those changes are applied to the actual disk space. WAL archives are copies of the WAL files and exist in a separate directory to avoid plausible crashes.
Step 1: Start the backup by first connecting to the database as a privileged user and then running the following. /db is the name of the database you wish to back up.
SELECT pg_start_backup('label',false,false); tar -cf -z backup_name.tar /db $the name of the database you wish to backup
- -cf : this indicates a ‘custom format’. By including this, the user states that the archive file will have a custom format. This introduces flexibility when re-importing the file.
- -z : this attribute to the command indicates to postgreSQL to compress the output file.
Step 2: Tell the database to terminate backup mode.
SELECT * FROM pg_stop_backup(false, true);
What this command also does is that it creates a .backup file that helps identify the first and last archive logs. This is important when restoring the database.
In the very likely case that the user forgets to enable WAL archiving, this is displayed on the shell:
Set the following in the postgresql.conf file:
-wal_level = replica -archive_mode = on -archive_command = 'cp %p /%f'
This enables WAL archiving. %p is the path name and %f is the name of the file. Setting parameters in the configuration file is only possible via command line.
Demerits of File System Level Backups:
- They require for the entire database to be backed-up. Backing up schemas or only specific tables is not an option. The same applies for the restoration process as well i.e., the entire database must be restored.
- In consequence, they take up more space on storage by default.
- The server must be stopped to obtain a usable backup. This results in unnecessary overhead and disrupts continuity in business transactions.
Another method of issuing a backup is by using pg_basebackup for the following reasons:
- Recovery from the backup is faster and safer
- It is installation version specific
- Backups are done over replication protocols
When it comes to smaller Databases, it is much more convenient to execute an SQL Dump. SQL Dumps are logical backups. This implies that the method would backup all instructions that were used to create the schemas and the tables. Therefore, the file that is exported after the backup is literally a file full of DDL and DML commands.
PostgreSQL offers two methods by which an SQL dump may be performed.
pg_dump is a simple command that creates a copy of one of the databases on the server. Think of it as ‘dumping the object files of only ONE database into a new file. At the end of this process, a new file is created which is human-interpretable (filled with SQL commands). pg_dump works with the following options:
- -Fc : exports the file in custom format – explained above for File System Level backups
- -Z : this is used along with the command to compress the data. As one would expect, this is used with large databases.
- -j : used to enable parallel backups using pg_dump again
- -a : this option specifies that only data will be backed up.
- -s : used when only schemas are required to be backed up.
- -t : used when only tables of the database required to be backed up.
- Here is the sample format by which a user may create an SQL dump using pg_dump:
>> C:\Program Files\PostgreSQL\13\bin >> pg_dump -U username -W -F p database_name > path_to_output_file.sql >> Password: >>(For example) >> pg_dump -U dbuser04 -W -F p My_University > University_backup.sql
The user must first navigate to the \bin folder of the PostgreSQL version installed ON THE TERMINAL or CMD. Once that is done, the command – as stated – may be performed. It is suggested that the output file be created in a known location prior to execution (don’t rely on PostgreSQL to create it for you). In the command above, -W is an option which enables verification of the user via the password. -F is used to specify the format of the output file in the following character (remember to change the extension of the output file). Therefore:
- p = plain text file
- t = tar file
- d = directory-format archive
- c = custom format archive
The apparent caveat is how pg_dump is restricted to creating a copy of a single database at a time. For situations where multiple databases in the server must be backed up, PostgreSQL makes available the pg_dumpall command. For most part, its syntax stays the same and so do the options with the exception of –W (honestly, no one really would want to type the password for every single database they backup).
>> C:\Program Files\PostgreSQL\13\bin >> pg_dumpall -U username -F p > path_to_output_file.sql
In many cases, firms might prefer to only make copies of the schemas or certain object definitions and not really the data. This is very much accounted for.
>> pg_dumpall --tablespaces-only > path_to_output_file.sql $for only tablespaces >> pg_dumpall --schema-only > path_to_output_file.sql $for only schemas
Demerits of SQL Dump:
Based on multiple IT factions of corporations utilizing PostgreSQL servers for day-to-day procedures, the shortcomings of SQL Dumps easily provide incentive to use other approaches for Backups and Recovery.
- As we have seen, SQL dumps literally recreate every variable based on the instructions stored in the output file. In fact, it works on rebuilding all indexes as well. The cap on restoring speeds is obvious as a result.
- A major consequence of the above is the unwanted overhead which businesses never welcome.
- Technically, with multiple dumps occuring simultaneously, syntax reversal is a major issue.
- Being a logical export, SQL dumps may not be 100% portable.
Consider a database ‘crimedb’ which stores a record of all criminals in the police registry. We shall attempt a regular backup of all tables.
This is a rather complex procedure when compared to the previous approaches discussed. However, the benefits prevail.
Continuous or repetitive archiving combines the ideas of issuing a File System Level Backup and a backup of the WAL files (technically, enabling WAL archiving separately). It may seem like File System Level Backups all over again, but with the added emphasis on retaining and not just archiving the WAL files while the backup is underway. This enables us to take ‘simultaneous snapshots’ of the database when offline. Another difference is with the restoring procedure – recovery of Continuous Archiving backups is much different.
This section deals with Point-in-Time-Recovery Backups. This is a form of continuous archiving where the user may back up and restore data to a certain point in time in the past. The wordplay is based on the latest WAL archive created.
In short, the procedure is as follows:
- Enable WAL Archiving
- Create a way by which the WAL files are retained.
- Make a File System Level backup (we will use pg_basebackup this time).
- Retain the backup so that the DB is replayed to a point in time.
First, let’s connect to the server and execute the following in the psql shell:
>> show archive_mode;
If archiving is disabled then the following is displayed:
The first step (enabling archiving) varies from system to system based on the OS and sometimes the version of Postgres installed. All remaining steps follow a common procedure.
Step 1 (Windows):
First things first, stop the Postgres service from the Windows service manager.
Now, head to the Postgres.conf file in the \data directory. For example C:\Program Files\PostgreSQL\13\data and make the following parameter changes:
>> archive_mode = on >> archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"' >> wal_level = replica
Replace %p with the path of the file we wish to archive which is pg_wal and replace %f with the filename. Be sure to use ‘\\’ for the directories to avoid any issues.
Now, Start the Postgres services.
Step 1 (Linux):
Follow the procedure dictated in the code below:
$ mkdir db_archive $ sudo chown postgres:postgres db_archive
Now, we have a directory created to which we shall write the archives. Also, we have given permission to the Postgres user to write to this directory. We must configure our parameters in the postgresql.conf file. Find the file in the data directory or the main directory of the PostgreSQL version you have installed, manually. Then make the following changes to the file (open the file and edit its contents):
# uncomment archive parameter and set it to the following: archive_mode = on archive_command = 'test ! -f /path/to/db_archive/%f && cp %p /path/to/db_archive/%f' # change wal_level as well wal_level = replica
Save and exit the file. You must restart the database server now to apply these changes to the setup. The Linux equivalent of this is:
$ sudo systemctl restart postgresql.service
Steps 2 – 4 (common for both platforms):
Open the SQL prompt or SQL Shell (psql) on your system. Connect to the server. Now execute the following query to make a backup.
postgres=# SELECT pg_start_backup('label',false,false) postgres=# pg_basebackup -Ft -D directory_you_wish_to_backup_to postgres=# SELECT pg_stop_backup(true,false)
pg_start_backup( ) and pg_stop_backup( ) may be omitted since, pg_basebackup( ) is capable of handling control on its own. Once the prompt is returned, users may check the directory to which the backup was created and be reassured to find a .tar backup file. It is up to the user to specify the kind of file format preferred (p or t or c).
Now that we have a file system level backup and the WAL archives, we can execute a Point in Time Recovery. Let us assume that our Database crashed and as a result, our data was corrupted or damaged, or destroyed.
Stop the services of the database once the backup has been created and delete all the data within the data directory since the data is no longer validated on account of the DB crash.
$ sudo systemctl stop postgres.services $ or use the windows services panel (if you use Windows)
This is the extent to which continuous archiving can be dealt with (we are only creating a backup, not restoring it yet). However, the process is rather simple and straightforward since all we require to do is import the backups back into the original folders or the default folders/directories that PostgreSQL used initially upon installation. This is followed by a minor change in the postgresql.conf file and that’s about all there is to it!