Implement your own tail (Read last n lines of a huge file)
Last Updated :
29 May, 2017
Given a huge file having dynamic data, write a program to read last n lines from the file at any point without reading the entire file. The problem is similar to tail command in linux which displays the last few lines of a file. It is mostly used for viewing log file updates as these updates are appended to the log files.
Source : Microsoft Interview
We strongly recommend you to minimize your browser and try this yourself first.
The problem mainly focuses on below things –
1. The program should not read entire file.
2. The program should handle incoming dynamic data and returns last n lines at any point.
3. The program should not close input stream before reading last n lines.
Below is its C++ implementation
#include <bits/stdc++.h>
using namespace std;
#define SIZE 100
void sleep(unsigned int n)
{
clock_t goal = n * 1000 + clock ();
while (goal > clock ());
}
void tail( FILE * in, int n)
{
int count = 0;
unsigned long long pos;
char str[2*SIZE];
if ( fseek (in, 0, SEEK_END))
perror ( "fseek() failed" );
else
{
pos = ftell (in);
while (pos)
{
if (! fseek (in, --pos, SEEK_SET))
{
if ( fgetc (in) == '\n' )
if (count++ == n)
break ;
}
else
perror ( "fseek() failed" );
}
printf ( "Printing last %d lines -\n" , n);
while ( fgets (str, sizeof (str), in))
printf ( "%s" , str);
}
printf ( "\n\n" );
}
int main()
{
FILE * fp;
char buffer[SIZE];
fp = fopen ( "input.txt" , "wb+" );
if (fp == NULL)
{
printf ( "Error while opening file" );
exit (EXIT_FAILURE);
}
srand ( time (NULL));
for ( int index = 1; index <= 10; index++)
{
for ( int i = 0; i < SIZE - 1; i++)
buffer[i] = rand () % 26 + 65;
buffer[SIZE] = '\0' ;
time_t ltime = time (NULL);
char * date = asctime ( localtime (<ime));
date[ strlen (date)-1] = '\0' ;
fprintf (fp, "\nLine #%d [%s] - %s" , index,
date, buffer);
fflush (fp);
tail(fp, index);
sleep(3);
}
fclose (fp);
return 0;
}
|
Some points to Note –
1. This code won’t work on online compiler as it requires file creation permissions. When run local machine, it produces sample input file “input.txt” and dynamically write data to it 10 times and calls tail() function every time.
2. We should avoid using fseek() and ftell() for huge files(in GBs) as they operate on file positions of type long int. Use _fseeki64(), _ftelli64() instead.
3. unsigned long has max allowed value of 232 – 1 (Assuming that unsigned long takes 4 bytes). It can be used for files size of less than 4 GB file.
4. unsigned long long has max allowed value of 264 – 1 (Assuming that unsigned long long takes 8 bytes). It can be used for files size over 4 GB.
Share your thoughts in the comments
Please Login to comment...