PostgreSQL – System Architecture
PostgreSQL is an open-source Database Management System that has an object-relational nature. PostgreSQL is a successor of one of the earliest systems i.e. POSTGRES system. It is one of the most widely used open-source database management systems.
PostgreSQL has a Client-server model of architecture. In the simplest term a PostgreSQL service has 2 processes:
- Server-side process: This is the “Postgres” application that manages connections, operations, and static & dynamic assets.
- Client-side process(Front-end applications): These are the applications that users use to interact with the database. It generally has a simple UI and is used to communicate between the user and the database generally through APIs.
Client-side Process :
When the user runs queries on PostgreSQL, the Client Application can connect to the PostgreSQL server (Postmaster Daemon Process) and submit queries through one of many Database Client Application program interface supported by PostgreSQL like JDBC, Perl DBD, ODBC, etc. that helps to provide client-side libraries. In the Client Process, the Communication between Client Application and Client Application library occurs with the help of Library API as shown in the figure below:
1. Postmaster Daemon Process :
The system architecture of PostgreSQL is based on Process-Per-Transaction Model(Client/Server Model). A running PostgreSQL site is managed by Postmaster which is a central coordinating process. It is also known as Server Process.
The postmaster daemon process is responsible for :
- Initializing the server
- Shutting Down the server
- Handling Connection requests from new clients.
- Perform Recovery.
- Run Background Processes.
Shared Memory: Shared memory is the memory that is simultaneously accessed by multiple programs in order to provide fast and efficient results with less redundancy. This is the memory that is reserved for Database Caching and transactional log caching. In PostgreSQL shared Disk Buffer and Shared Tables are Used whose working is explained below :
Shared Disk Buffer: The purpose of the shared disk buffer is to minimize the disk Input/Output.If it is not used then the Disk Input/Output takes more time which causes redundancy and an inefficient system. The advantages of using a shared buffer are:
- Reduce time.
- Can Access a very large amount of Data Easily.
- Minimize heating when multiple users is accessing at the same time.
Shared Tables: This approach involves using the same set of tables to host multiple client data. The main advantages of using this approach are:
- The Lowest Hardware Cost
- The Lowest Backup Cost
- It allows working with large data in a single database.
UNIX System: In UNIX System Kernel Disk Buffer maintain a Memory buffer and provide physical Storage to Data in Disk Storage. Also, the commands of PostgreSQL is verified that the syntax is written is correct and provide an error message with the reason that what is missing in the command, etc.
2. Back-end process:
The Postmaster is responsible for handling initial client connections.For this, it constantly listens for new connections as a known port. After Performing an initialization process such as authentication of the user, the postmaster will give rise to a new backend server process to handle the new client. The client interacts only with the Backend server process like submitting queries and receiving queries result. This will show that PostgreSQL actually uses Process- per-transaction model.
The Backend Server is responsible for Executing queries submitted by the client by performing specific operations.Each backend server will handle only a single query at a time. At a time multiple clients are connected to the system hence multiple backend servers executing queries Concurrently. The back-end server access data from the main-memory buffer pool which is placed in shared memory.
After that, the result obtained is provided to the Client Process by Back-end Process.
WAL(Write Ahead Log)Writer This process Write and flushes WAL Data on WAL buffer logging collector This process is also called logger. It will write an error message to the log file. Auto vacuum launcher When auto vacuum is enabled, this process has the responsibility of the auto vacuum daemon to carry vacuum operations on bloated tables. This process relies on the stats collector process for perfect table analysis. Archiver When Achiever is enabled, the process has the responsibility to copy WAL log file to the specified directory. stats collector In this statistics information like pg_stat_activity and pg_stat_all_tables is collected checkpointer When a checkpoint occurs, the dirty buffer written to the file. writer It will periodically write the dirty buffer to the file.
3. Shared Pool:
The Shared pool is a RAM area within the RAM Heap that is created during the starting time. A shared pool is a component of SGA (System Global Area). If Shared Pool is not available in RAM or it is not used then it results in high library cache reloads, high row cache reloads. The Shared pool is a RAM area within the RAM Heap that is created during the starting time. A shared pool is a component of SGA (System Global Area). If Shared Pool is not available in RAM or it is not used then it results in high library cache reloads, high row cache reloads.
Why PostgreSQL didn’t use Shared Pool?
PostgreSQL doesn’t provide a shared pool although most of the Database Systems like Oracle, the Shared pool is an important component of its structure. It doesn’t have because PostgreSQL will provide a feature to share SQL information at the process level as compared to Shared Pool. Simply, if the user will execute the same SQL query several times in one process, it will hard-parse only once which is advantageous over other Database systems because, in another database system that uses a shared pool, the hard-parse occurs for a single SQL statement that is loaded from Shared pool. If the user executes simultaneously a single SQL query several times then it will cause more load.
4. OID in PostgreSQL :
OID stands for Object Identifier types. OID is used by PostgreSQL as a Primary Key for various system tables. It is implemented as an unsigned four bytes integer. We can also have an option to use OID in the user-defined table as” WITH OIDS ” but is discouraged to use because it is not large enough to provide uniqueness in a Large User-Defined Table. OID’s usually fits best for system tables. It basically gives a built id for every row, contained in the system column.
In PostgreSQL 12 version, the feature of OID for User tables has been removed indirectly i.e. we can use OID explicitly.
Merits of PostgreSQL:
- PostgreSQL is a highly risk-tolerant Database and requires low maintenance cost.
- It uses LAMP (Linux, Apache, MySQL, PHP) Stack to execute dynamic website and web-application.
Demerits of PostgreSQL:
- It is a bit slow as compared to a commercial Database.
- It doesn’t support the various open-source applications as compared to MYSQL.