GeeksforGeeks App
Open App
Browser
Continue

# Relational Query Evaluation | Set 2

Prerequisite – Relational Query Evaluation | Set 1
Data is stored in disks. Disks are then manipulated using read and write methods. More number of manipulations, lesser number of disks would be failing. To overcome this and to increase life of disks we want to optimize disks used by ensuring fact that we minimize read and write operations using disks.

1. Projection Operation
This operation involves selection of attributes from record. This operation is costly one. For carrying out projection operation, every record in file needs to be scanned to formulate resulting record. Duplicate records in resulting record need to be eliminated using sort or hash-based functions. It is essential to do entire file scan in case of using projection operation. A cost-friendly method would involve using projection operations only after selection operations. This is because there would be fewer records after selection is done and then, projection operation can occur involving lesser cost, lesser read and write operations and more life of disks.

2. Sorting –
Sorting is very commonly applied operation. It has its usage in removing duplicate elements, record grouping, joining, etc. External sorting involves huge files present on disk. We can arrange files in increasing or decreasing order of records. We use merge sort for these operations. The merge sort involves 2 phases, namely, sort phase and merge phase. Use of sort phase is in creation of sub-files which are sorted is prevalent. These sorted sub-files are also known as runs. The merge phase is known for merging the sub-files to eventually create one sorted file.

Assumption :
The assumption is that data is very huge and it is stored in n number of blocks and memory (m blocks) is much smaller than data. The next steps that would be involved are reading m blocks, sorting it in memory and then writing to disk as single file called run.
These steps are to be repeated –

`(n / m) number of times `

Take, ceiling value whenever value is not integer.

Complexity :

`(2*n) times block access`

We can say that creation of “r” number of sub-files has taken place, each of which is sorted. Buffer space being only m blocks, we can operate on only m blocks at time. Then, move operated part to memory and operate on next blocks (at max m blocks at a time).

My Personal Notes arrow_drop_up