Open In App

Extract Emails From a Text File Using Grep Command in Linux

Last Updated : 27 Jun, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Let’s consider we have a text file that contains lots of text and in that text file there are some email IDs present, and we have to find the all email IDs present in that text file. So what we can do? How can we find all email IDs present in that text file?. One way to find all email IDs manually, but this is a very time-consuming and boring process. Another option is can take the help of the grep command Linux to find all email IDs in text files.

Grep command on Linux

Grep command in Linux finds the pattern in a string or file and prints all lines or sub-strings that match the given pattern. The pattern provided to the grep command is generally known as a regular expression. The general syntax of the grep command is as follows:

$ grep <pattern> filepath/filename

The general format of Email IDs

To write the regular expression to provide the, grep command, first we need to understand the general pattern or format of email IDs.

The general form of email IDs is as follows:

<username>@<domain>.<address>

Email IDs had mainly 3 fields username, domain, and address. Let’s write regression for each field.

Regular expression for filtering Email ID

Now let’s write the regular expression for filtering email IDs. Let’s start with the username. Username can contain capital (A-Z) and small(a-z) letters, digits (0-9), and special symbols like full-stop, and underscore hyphens. So, the regular expression for the username will be  [a-zA-Z0-9._-] 

Domain and address generally contain capital (A-Z) and small (a-z) letters. So the regular expression for Domain and address will be [a-zA-Z] 

Now let’s combine the regular expression of email ID fields and make one regular expression for email IDs’. We can combine patterns using \+ characters. So final regular expression will be:

[a-zA-Z0-9._-]\+@[a-zA-Z]\+.[a-zA-Z]\+

Filtering Email IDs using the grep command

 We have a regular expression pattern. We can use that pattern to print all email ids. Let’s take one text file for example.

This is sample text file.
This file contains email IDs.
example1@mail.com this is email ID of person 1.
example2@mail.com this is email ID of person 2.
example@gmail.com is email ID with Gmail domain.
These are the email IDs.

Name of File: emails_file.txt.

Let’s use the grep command with the regular expression we created on this file and see the result.

$ grep -e “[a-zA-Z0-9._-]\+@[a-zA-Z]\+.[a-zA-Z]\+”  emails_file.txt

-e option is used to mention the pattern to find the filter in the file.

Following is the result of the above grep command:

 

In the result of the above command, we can see that the email IDs are printed but with email IDs, the other text on the respective email IDs line is also printed.

The grep command gives us the -o option to print the string with the only matched pattern. We just have to use the -o option with the grep command to get a string that matches the given pattern.

grep -oe “[a-zA-Z0-9._-]\+@[a-zA-Z]\+.[a-zA-Z]\+”  emails_file.txt

The following is the result of the above command:

 

Now we can see that only email IDs are printed. This is the result we wanted.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads