Open In App

Primary Indexing in Databases

Last Updated : 06 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Indexing is a technique used to reduce access cost or I/O cost, now the question arrives what is access cost? Access cost is defined as the number of secondary memory blocks which is transferred from secondary memory to main memory in order to access required data. In this article, we are going to discuss every point about primary indexing.

What is Primary Indexing?

A Primary Index is an ordered file whose records are of fixed length with two fields. The first field of the index is the primary key of the data file in an ordered manner, and the second field of the ordered file contains a block pointer that points to the data block where a record containing the key is available.

Primary Indexing

Primary Indexing

Working of Primary Indexing

  • In primary indexing, the data file is sorted or clustered based on the primary key as shown in the below figure.
  • An index file (also known as the index table) is created alongside the data file.
  • The index file contains pairs of primary key values and pointers to the corresponding data records.
  • Each entry in the index file corresponds to a block or page in the data file.
Primmary-Index

Primary Indexing

Types of Primary Indexing

  • Dense Indexing: In Dense Index has an index entry for every search key value in the data file. This approach ensures efficient data retrieval but requires more storage space.
    No of Index Entry = No of DB Record
  • Sparse Indexing: Sparse indexing involves having fewer index entries than records. The index entries point to blocks of records rather than individual records. While it reduces storage overhead, it may require additional disk accesses during retrieval.
    No of Index Entry = No of Block

Advantages of Primary Indexing

  • Primary indexing allowed fast retrieval of records based on their primary key values.
  • Primary indexing reduces the need for full-table scans or sequential lookups, due to which it reduce disk I/O operations.
  • The data file is organized based on the primary key, ensuring that records are stored in a logical order. This organization simplifies range queries and other operations.
  • Each block in the data file corresponds to an entry in the primary index. Therefore, the average number of blocks accessed using the primary index can be estimated as approximately log₂(B + 1), where B represents the number of index blocks.

Disadvantages of Primary Indexing

  • Primary indexing requires additional space to store the index field alongside the actual data records. This extra storage can impact overall system costs, especially for large datasets.
  • When records are added or deleted, the data file needs reorganization to maintain the order based on the primary key.
  • After record deletion, the space used by the deleted record must be released for reuse by other records. Managing this memory cleanup can be complex.

Question for Practice

Question: Consider a database of fixed-length records, stored as an ordered file. The database has 25,000 records, with each record being 100 bytes, of which the primary key occupies 15 bytes. The data file is block-aligned in that each data record is fully contained within a block. The database is indexed by a primary index file, which is also stored as a block-aligned ordered file. The figure below depicts this indexing scheme.

Primary index

Primary Indexing

Suppose the block size of the file system is 1024 bytes, and a pointer to a block occupies 5 bytes. The system uses binary search on the index file to search for a record with a given key. You may assume that a binary search on an index file of b blocks takes ⌈log2⁡b⌉ block accesses in the worst case. Given a key, the number of block accesses required to identify the block in the data file that may contain a record with the key, in the worst case, is _______. [ GATE CS/IT 2023 ]

Solution:

As it is given it is fixes length record,
So one block should fully contain within block so here it means
that they are saying that you should use un-spanned file organization.

=> No. of Records(Rn) = 25,000
=> Size of Record(Rs) = 100 Bytes
=> Size of Key(Ks) = 15 Bytes
=> Block Pointer Size(Ps) = 5 Bytes

So , Index Size(Is) = Size of Key + Block Pointer Size = 15 + 5 = 20 Bytes
Disk Block Size(Bs) = 1024 Bytes

=> No. of Record per Block = ⌊(Disk Block Size/Record Size)⌋
= ⌊1024⌋ = ⌊10.24⌋ = 10
=> No. of Block Required for record = ⌈( no. of record / no of record per block)⌉
= ⌈ 25000/10 ⌉ = 2500

=> The database is indexed by a primary index file it means,
that here we have to use sparse indexing.
=> No. of Index per Block =⌊(Disk Block Size/Index Size)⌋ 
= ⌊ (1024/20) ⌋
= 51 index per blocks
=> No. of Block Required for indexes = ⌈ ( No. of index /No. of index per block) ⌉
= ⌈ 2500/51 ⌉
= 50

=> It is given that Binary Search is used to search the desired block
and we know that it takes (log n) time to search in worst case
where n is the no. of comparison required

=> Here No. of comparison = No. of Index Block

=> Hence the number of block accesses required to identify the block in
the data file that may contain a record with the key,
in the worst case, is (log 50) = 6 Blocks

Frequently Asked Question on Primary Indexing – FAQs

Can we build two primary indexes on a file?

No we can’t build two primary index on a file, there is only one primary index on a file.

What is the difference between clustered and non-clustered primary index?

The records in the data file in clustered index are ordered physically based on primary key while in non-clustered primary index the records in the data file are not ordered physically.

How does primary indexing improve database performance?

Primary indexing improve the performance of the database by decreasing the time required to retrieve record from table.



Similar Reads

Secondary Indexing in Databases
Pre-requisites: Primary Indexing in Databases Databases are a critical component of modern applications, storing vast amounts of data and serving as a source of information for various functions. One of the primary challenges in managing databases is providing efficient access to the stored data. To meet this challenge, database management systems
5 min read
Clustering Indexing in Databases
Pre-requisites: Primary Indexing in Databases, indexing Databases are a crucial component of modern computing, providing a structured way to store, manage, and retrieve vast amounts of data. As the size of databases increases, it becomes increasingly important to have an efficient indexing mechanism that can quickly search and retrieve data. Cluste
4 min read
Indexing in Databases - Set 1
Indexing improves database performance by minimizing the number of disc visits required to fulfill a query. It is a data structure technique used to locate and quickly access data in databases. Several database fields are used to generate indexes. The main key or candidate key of the table is duplicated in the first column, which is the Search key.
9 min read
Concept of indexing in Apache Cassandra
Prerequisite - Introduction to Apache Cassandra Index: As we can access data using attributes which having the partition key. For Example, if Emp_id is a column name for Employee table and if it is partition key of that table then we can filter or search data with the help of partition key. In this case we can used WHERE clause to define condition
5 min read
Local Indexing and Materialized views in Cassandra 3.0
Prerequisite - Concept of Indexing, Concept of Materialized Views In this article, we will see how we can do local indexing and how it works and how materialized views works internally. Let's discuss one by one. First, we need to create a table. let's consider a table Team_data in which id, name, address are the fields. Let's have a look. CREATE TA
2 min read
Difference between Indexing and Hashing in DBMS
1. Indexing :Indexing, as name suggests, is a technique or mechanism generally used to speed up access of data. Index is basically a type of data structure that is used to locate and access data in database table quickly. Indexes can easily be developed or created using one or more columns of database table.2. Hashing :Hashing, as name suggests, is
3 min read
Difference Between Indexing Techniques in DBMS
Database indexing plays a crucial role in improving the performance and efficiency of database systems. By utilizing indexing techniques, we can speed up data retrieval operations and enhance overall system responsiveness. This article will delve into various database indexing techniques, including B-tree, Hash Indexing, and Bitmap Indexing. We wil
3 min read
Indexing in MongoDB
MongoDB is leading NoSQL database written in C++. It is high scalable and provides high performance and availability. It works on the concept of collections and documents. Collection in MongoDB is group of related documents that are bound together. The collection does not follow any schema which is one of the remarkable feature of MongoDB. Indexing
2 min read
Bitmap Indexing in DBMS
Bitmap Indexing is a data indexing technique used in database management systems (DBMS) to improve the performance of read-only queries that involve large datasets. It involves creating a bitmap index, which is a data structure that represents the presence or absence of data values in a table or column. In a bitmap index, each distinct value in a c
8 min read
How do Keys Assist in Indexing Strategies?
Keys enhance indexing by enabling efficient query performance and data retrieval through structured database indexes.Keys significantly boost database indexing and query performance through several mechanisms: Unique Identification :Facilitate quick data retrieval by uniquely identifying each record, eliminating the need for exhaustive dataset scan
1 min read
Article Tags :