There is a need for conversion of relational query to algebraic expressions. This expression is operated on so that final output is received. The relational algebra operators that are present are Union, Select, Join, Project, etc. These are used to formulate relational query for adequate results. Grouping, partitioning and aggregation are all parts of relational query.
Selection of proper algorithms is necessary to optimize life and performance of disks, where data is stored. If lot of read and write operations are performed (data modification and updation) on disk, its life is reduced due to wear and tear.
- One Selection Condition –
In case we need to use select operator for selecting particular tuples of relation based on condition, following things can be done. If there is only one selection condition, then we need to check for presence of indexes. If indexes are present on attribute, we can easily find selection based on these indexes as indexes provide record and block pointers that can be used in searching. For any reason, if indexes are absent, then entire file needs to be searched for selection.
- Multiple Conjunctive Selection Condition –
In the case of multiple conjunctive selection condition, if none of attributes have any index, complete file scan needs to be done for selection of relevant tuples. However, if any of attributes do have index, then we need to go to relevant attribute and select tuples whose values for attribute are equivalent to given condition, and then in those records, other conjunctive conditions can be checked.
- Multiple Disjunctive Selection Condition –
In case attributes have multiple disjunctive selection condition, it is of no use if only one attribute has index, rather it is necessary that all attributes have index. Otherwise, complete file scan is necessary to get selections.
- Selection of Predicates –
If index is present for multiple attributes, one can choose highly selective attribute. This can be done by using Selectivity. The selectivity of any particular attribute is ratio of the number of records satisfying condition to total number of records. The selectivity can range anywhere between and including 0 and 1. The condition is rated as highly selective if it has low value of selectivity.
For the conjunction of conditions, attribute with least selectivity value is to be chosen. Tuples corresponding to condition are to be found and then other conditions are to be implemented on it. The selectivity values or their estimates can be recorded while database is in use. It can be placed in DB catalog. The DB catalog (database catalog) consists of metadata and relevant statistics about data, such as selectivity. These are used by Query optimizer which is part of RDBMS Architecture.
The formulas for estimating Selectivity are given below :
- In case of key attributes selectivity is roughly –
1 / (total db records)
- In case of non key attributes selectivity is roughly –
1 / distinct values of columns
There is also presence of Histograms in DBMS. The histogram represents distinct values of attributes or range of values of attributes in horizontal axis. Whereas, on vertical axis, it represents number of records corresponding to values in range. We can use histogram to estimate number of records that correspond to given condition in database. These estimations can be recorded offline as well.
Refer for – Relational Query Evaluation | Set 2
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.