Open In App

How to Print the Longest Line(s) in a File in Linux

Last Updated : 02 Jan, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Text files are frequently processed while using the Linux command line. This article will go over how to determine which lines in a file are the longest. we will use some commands like awk and grep to achieve our goal to print lines with the longest length. When working with enormous log files. Each text line in these files, which number in the hundreds of thousands, is a single JSON document that has been rendered as a single text line.  To properly reroute the file(s) to a target server, such as an elastic search server, it might be necessary to process these text lines through a proxy server if their size is unusually/very large. Sometimes when file size is tremendous Sadly, egrep reports that “the regex is too long.” Then the awk command comes into play.

First, have a look at both of these commands. 

1. awk command:

When using the command line, the scripting language awk is useful. It’s a commonly used command for processing text. The script runs to look for patterns that match in one or more files, and if it finds any, determines if those patterns should carry out particular actions. This manual explores the capabilities of the AWK Linux command.

Here we use the awk command for printing every line that fits a particular pattern.

Syntax:

$ awk options ‘selection _criteria {action }’ input-file > output-file

2. Grep command:

The most potent and often used Linux command-line tool is the grep (global regular expression print) command. By giving Grep search criteria, you can look for pertinent information. In a given file, it looks for a specific expression pattern. When a match is made, it publishes all the file’s lines that adhere to the given pattern.

Syntax:

$ grep "string" file name

Create the text file:

Run the command listed below to create a text file using the command line:

$ touch file_name.txt
Creating text file

 

Then include texts into your document using any text editor of your choice (we’ll be using nano editor here).

nano file_name.txt
Inserting text into file

 

Add texts to the file after that. Use the cat command along with the file name to view the file.

cat file_name.txt
Displaying contents of file

 

Our document has been made, and the content has been added.

Method 1: Using the Awk command, find the longest line in a file

Let’s prepend the size of each line with a one-liner in awk to help us determine which lines are the longest:

$ awk ‘{printf “%2d| %s\n”,length,$0}’ file_name.txt

Displaying length of each line using awk command

 

The longest line length is 52, as shown in the screen capture up top.

The Pitfall of Using the wc Command

  • We can print the max line length using the wc command’s -L (-max-line-length) option: If the input contains TAB characters, wc -L will catch us off guard.
  • The reason for this is that, despite the long option’s name, wc -L outputs the max display width rather than the maximum line length.
  • A TAB is counted as 8 characters by the wc command. There is currently no way to modify it.

Method 2: Assemble the wc and grep Commands:

To locate all longest lines, we can now simply combine the wc -L and grep commands:

You can utilize regex from the grep command & max-line-length from the wc command by combining these two instructions. As shown in the example below, the wc command accepts the -L command flag to specify the maximum line length.

$ grep -E “^.{$(tr ‘\t’ ‘ ‘ <file_name.txt | wc -L)}$” file_name.txt

Combine wc and grep command

 

You got your line with the longest length.

Benchmarking Performance:

With the help of the time command, we’ll evaluate how well the wc & grep solution performs.

  • grep and wc command benchmark:

$ time grep -E “^.{$(tr ‘\t’ ‘ ‘ <file_name.txt | wc -L)}$” file_name.txt > /dev/null

Evaluation of wc and grep command

 

  • awk command benchmark:

$ time awk ‘{ln=length}ln>max{delete result; max=ln} 
ln==max{result[NR]=$0} END{for(i in result) print result[i] }’ file_name.txt > /dev/null

 

Conclusion:

We discussed approaches in this post for identifying the longest lines in an input file. We reviewed why the awk technique is substantially faster than the wc + grep strategy as well as benchmarked their performance. In addition, we looked more closely at a flaw in the wc command which we need to be careful of when using the -L option.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads