SQL Query to Convert Date Range to Per Day Records
In this article, we are going to see how we can convert the date range to per day records, but before that, we need to have a clear idea of where it is going to be helpful for us.
Suppose, We have a database where we want to store all the project details records of the students of a particular class. There we can see how long it took a student to complete his/her project, but it is saved in a date range form. For example, Student-1 started doing the project from 2021-11-20 and completed it in 2021-11-24. So, how long did it take her to complete? 5 days, simple. However, the calculation would not seem that simple when we are working with millions of data. That is why we want to convert this from the date range to per day records. As we can easily apply aggregate functions on separate rows, and this would make our analysis simpler.
Here we going to use a “tally table” or “numbers table“.
Numbers table/Tally table: A numbers table is referred to as the “Swiss Army Knife” of SQL Server. It can be used in place of the loops with faster set-based operations, expand data sets, insert test data, and many more other things. It will have a single column, with consecutive numbers from either 0 or 1 to the highest number you could need.
--create a table CREATE TABLE Sample(SampleId int not null) DECLARE @NoOfRows int SET @NoOfRows = 1000000 --We can add as many rows as you want ; WITH P0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows P1 AS (SELECT 1 AS C FROM Pass0 AS A, P0 AS B), --4 rows P2 AS (SELECT 1 AS C FROM Pass1 AS A, P1 AS B), --16 rows P3 AS (SELECT 1 AS C FROM Pass2 AS A, P2 AS B), --256 rows P4 AS (SELECT 1 AS C FROM Pass3 AS A, P3 AS B), --65536 rows P5 AS (SELECT 1 AS C FROM Pass4 AS A, P4 AS B), --4,294,967,296 rows Tally AS (SELECT row_number() over(ORDER BY C) AS Number FROM P5) INSERT Sample (SampleId) SELECT Number FROM Tally WHERE Number <= @NoOfRows
Note: The following syntax is invented by Itzik Ben-Gan, a T-SQL guru
Since we are using a WITH clause, it can be worked into an INSERT and SELECT. In the last line, WHERE Number <= @NoOfRows can easily be adjusted and we can put whatever range.
Now, let us see how to SQL query convert date range to per day records.
Step 1: Create a database and Use it
Use the following command to create a database and use it.
CREATE DATABASE GFG_Demo; USE GFG_Demo;
Step 2: Table definition
We have the following work done table in the database.
CREATE TABLE WorkDone (ID VARCHAR(10) NOT NULL , StartDate DATE NOT NULL , EndDate DATE NOT NULL);
Step 3: Insert values
The following command is used to insert values into the table.
INSERT INTO WorkDone(ID,StartDate,EndDate) VALUES('S1', '2021-01-02','2021-01-06'), ('S2', '2021-02-03','2021-02-07'), ('S3', '2021-03-02','2021-03-09'), ('S4', '2021-01-05','2021-01-11'), ('S5', '2021-02-22','2021-02-26'), ('S6', '2021-03-12','2021-03-17'), ('S7', '2021-01-05','2021-01-13'), ('S8', '2021-02-05','2021-02-09'), ('S9', '2021-03-13','2021-03-18'), ('S10', '2021-01-07','2021-01-12'), ('S11', '2021-02-06','2021-02-10'), ('S12', '2021-03-12','2021-03-19'), ('S13', '2021-01-12','2021-01-15'), ('S14', '2021-02-20','2021-02-24'), ('S15', '2021-03-10','2021-03-15'), ('S16', '2021-01-21','2021-01-28'), ('S17', '2021-02-21','2021-02-26'), ('S18', '2021-03-11','2021-03-16'), ('S19', '2021-01-23','2021-01-28'), ('S20', '2021-02-12','2021-02-16'), ('S21', '2021-03-08','2021-03-10'), ('S20', '2021-01-08','2021-01-14'), ('S21', '2021-02-05','2021-02-09'), ('S22', '2021-03-06','2021-03-09'), ('S23', '2021-01-14','2021-01-16'), ('S24', '2021-02-22','2021-02-26'), ('S25', '2021-03-05','2021-03-13');
Step 5: View data of the table
SELECT * FROM WorkDone;
Step 6: Expanding the Date range
WITH S00(N) AS (SELECT 1 UNION ALL SELECT 1), S02(N) AS (SELECT 1 FROM S00 a, S00 b) , S04(N) AS (SELECT 1 FROM S02 a, S02 b) , S08(N) AS (SELECT 1 FROM S04 a, S04 b) , S16(N) AS (SELECT 1 FROM S08 a, S08 b) , S32(N) AS (SELECT 1 FROM S16 a, S16 b) , CteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM S32) , DateRange AS (SELECT SeparatedDate = DATEADD(DAY,N - 1,'2021-01-01') FROM CteTally WHERE N <= 365 ) SELECT * FROM WorkDone w JOIN DateRange d ON d.SeparatedDate >= w.[StartDate] AND d.SeparatedDate <= w.[EndDate];
Here, In the first CTE S00, two rows are joined together, that is why it gives us 2 rows. The second CTE S02 is cross joined with itself and it results in 4 rows. The same process is continued for the rest of the rows as well, resulting in 16, 256, 65536,….rows, and the last one will result in 2^32 rows.
Note: 2^32 is the highest number an integer can hold in SQL Server.
- The window function ROW_NUMBER is used to assign a number to each row.
- Now, what about the subquery that we are using in the ORDER BY in the OVER clause? It is used to trick the server into not sorting the dataset. Sorting these many rows could seriously deteriorate the performance of the query. That query is used to assign a unique sequential number for each row, starting with the number 1.
- Remember, it is used to generate numbers, not dates. That is why we are using the DATEADD function to transform the numbers into dates. This will generate all the dates of the year 2021.
- To explore our sample data, we have to join the tally table to the work done table using a range join.
Here, we see it is successfully expanded. However, they are not in proper order, we can easily arrange them in proper order by adding the ORDER BY function at the very end of the query. As we have mentioned before, tally tables are expansive. We can set our desired limit and it will not affect the performance that much.
Please Login to comment...