Primary Indexing in Databases

Indexing is a technique used to reduce access cost or I/O cost, now the question arrives what is access cost? Access cost is defined as the number of secondary memory blocks which is transferred from secondary memory to main memory in order to access required data. In this article, we are going to discuss every point about primary indexing.

What is Primary Indexing?

A Primary Index is an ordered file whose records are of fixed length with two fields. The first field of the index is the primary key of the data file in an ordered manner, and the second field of the ordered file contains a block pointer that points to the data block where a record containing the key is available.

Primary Indexing

Working of Primary Indexing

In primary indexing, the data file is sorted or clustered based on the primary key as shown in the below figure.
An index file (also known as the index table) is created alongside the data file.
The index file contains pairs of primary key values and pointers to the corresponding data records.
Each entry in the index file corresponds to a block or page in the data file.

Primary Indexing

Types of Primary Indexing

Dense Indexing: In Dense Index has an index entry for every search key value in the data file. This approach ensures efficient data retrieval but requires more storage space.
No of Index Entry = No of DB Record

Sparse Indexing: Sparse indexing involves having fewer index entries than records. The index entries point to blocks of records rather than individual records. While it reduces storage overhead, it may require additional disk accesses during retrieval.
No of Index Entry = No of Block

Advantages of Primary Indexing

Primary indexing allowed fast retrieval of records based on their primary key values.
Primary indexing reduces the need for full-table scans or sequential lookups, due to which it reduce disk I/O operations.
The data file is organized based on the primary key, ensuring that records are stored in a logical order. This organization simplifies range queries and other operations.
Each block in the data file corresponds to an entry in the primary index. Therefore, the average number of blocks accessed using the primary index can be estimated as approximately log₂(B + 1), where B represents the number of index blocks.

Disadvantages of Primary Indexing

Primary indexing requires additional space to store the index field alongside the actual data records. This extra storage can impact overall system costs, especially for large datasets.
When records are added or deleted, the data file needs reorganization to maintain the order based on the primary key.
After record deletion, the space used by the deleted record must be released for reuse by other records. Managing this memory cleanup can be complex.

Question for Practice

Question: Consider a database of fixed-length records, stored as an ordered file. The database has 25,000 records, with each record being 100 bytes, of which the primary key occupies 15 bytes. The data file is block-aligned in that each data record is fully contained within a block. The database is indexed by a primary index file, which is also stored as a block-aligned ordered file. The figure below depicts this indexing scheme.

Primary Indexing

Suppose the block size of the file system is 1024 bytes, and a pointer to a block occupies 5 bytes. The system uses binary search on the index file to search for a record with a given key. You may assume that a binary search on an index file of b blocks takes ⌈log₂⁡b⌉ block accesses in the worst case. Given a key, the number of block accesses required to identify the block in the data file that may contain a record with the key, in the worst case, is _______. [ GATE CS/IT 2023 ]

Solution:

As it is given it is fixes length record,
 So one block should fully contain within block so here it means 
that they are saying that you should use un-spanned file organization.

=> No. of Records(Rn) = 25,000
=> Size of Record(Rs) = 100 Bytes
=> Size of Key(Ks) = 15 Bytes
=> Block Pointer Size(Ps) =  5 Bytes

So , Index Size(Is) = Size of Key + Block Pointer Size = 15 + 5 = 20 Bytes
Disk Block Size(Bs) = 1024 Bytes

=> No. of Record per Block =  ⌊(Disk Block Size/Record Size)⌋ 
                                                =  ⌊1024⌋ = ⌊10.24⌋ = 10
=> No. of Block Required for record = ⌈( no. of record / no of record per block)⌉
                                                                = ⌈ 25000/10 ⌉ = 2500

=> The database is indexed by a primary index file it means,
      that here we have to use sparse indexing.
=> No. of Index per Block =⌊(Disk Block Size/Index Size)⌋ 
                                       = ⌊ (1024/20) ⌋ 
                                       = 51 index per blocks
=> No. of Block Required for indexes = ⌈ ( No. of index /No. of index per block) ⌉ 
                                                            = ⌈ 2500/51 ⌉ 
                                                            = 50

=> It is given that Binary Search is used to search the desired block
     and we know that it takes (log n) time to search in worst case 
     where n is the no. of comparison required 

=> Here No. of comparison =  No. of Index Block 

=> Hence the number of block accesses required to identify the block in 
the data file that may contain a record with the key, 
in the worst case, is (log 50) = 6 Blocks

Frequently Asked Question on Primary Indexing – FAQs

Can we build two primary indexes on a file?

No we can’t build two primary index on a file, there is only one primary index on a file.

What is the difference between clustered and non-clustered primary index?

The records in the data file in clustered index are ordered physically based on primary key while in non-clustered primary index the records in the data file are not ordered physically.

How does primary indexing improve database performance?

Primary indexing improve the performance of the database by decreasing the time required to retrieve record from table.

Article Tags :

DBMS

GATE CS