Perl | Reading a CSV File

Perl was originally developed for the text processing like extracting the required information from a specified text file and for converting the text file into a different form. Reading a text file is a very common task in Perl. For example, you often come across reading CSV(Comma-Separated Value) files to extract data and information.

A CSV file can be created with the use of any text editor such as notepad, notepad++, etc. After adding content to a text file in the notepad, store it as a csv file with the use of .csv extension.
Example of a CSV File:

Store the above file as new.csv

A CSV file can be used to manage record files of databases of a business or a company. These files can be easily opened in Excel and can be manipulated with the use of any suitable software. Perl also supports manipulation and creation of these ‘csv’ files by extracting values from the file, manipulating these values and restoring them into the file. For Extracting every value from a specific line, we are going to use the split function.
 

Use of Split() for data extraction

split() is a predefined function in Perl which is used to separate a string into parts with the help of a delimeter. This delimeter can be any character as per user’s requirement, but generally, we take comma as a delimeter.
split() takes two parameters. The first is a delimiter, the second is the string that needs to be split.


Syntax:
split(_delimiter_, _string_);

Parameter:
_delimiter_ : Separator value between elements
_string_: From which values are to be extracted

Returns: an Array of string elements separated by _delimiter_

Example:

Input: $s = "Johny loves Sugar" 
Output: "Johny", "loves", "Sugar"
If Input string is passed to split function as,
@words = split("", $s);
The array @words will be filled with 3 values: “Johny”, “loves” and “Sugar”.

Note:

If $words[2] is printed then result will be "Sugar" as array indexing starts from 0.

Following steps are followed to split lines of a CSV file into parts using a delimiter:

Step 1: Read in the file line by line.
Step 2: For each line, store all value in an array.
Step 3: Print out all the values one by one to get the result

Let’s get to an example to get a better understanding of the topic. Following is a code for split() function to separate the strings stored in the new.csv file with the use of a delimiter:

filter_none

edit
close

play_arrow

link
brightness_4
code

use strict;
  
my $file = $ARGV[0] or die;
open(my $data, '<', $file) or die;
  
while (my $line = <$data>) 
{
    chomp $line;
  
    # Split the line and store it
    # inside the words array
    my @words = split ", ", $line;  
  
    for (my $i = 0; $i <= 2; $i++)
    {
        print "$words[$i] ";
    }
    print "\n";
}

chevron_right


Save the above code in a text file with a .pl extension. Here, we are going to save it as test.pl

Execute the above-saved file with the use of the following command:

perl test.pl new.csv

Output:

Escaping a comma character

Sometimes there might be a file which has comma within the fields of a string that if removed will change the meaning of the data or make the record useless. In such a situation if a split() function is used, even if within quotes, then it will separate the values each time it gets a comma as a delimiter, because split() function does not care about the quotes, nor does it understand anything about CSV. It just cuts where it finds the separator character.

Following is a CSV file which has a comma within the quotes:

In the above CSV file, it can be seen that the first field has a comma within itself, hence closed within quotes. But if we run the split() function on this file then it won’t care for any such quotes. Following is the result of applying split() function on such a file:

In the above file, split() function divided the string field into parts even if it was within quotes, also since, we were printing only three fields in our code, hence, the third field of the last string is dropped in the output file.

To handle such situations, some restrictions and scopes are added to Perl, these restrictions allow the compiler to skip the division of fields within quotes.
We use the TEXT::CSV which allows full CSV reader and Writer. TEXT::CSV is a module of MCPAN in Perl, which allows many new functionalities such as reading, parsing, and writing CSV files. These modules can be included in the Perl program with the use of the following pragma:

use Text::CSV

But first, there is a need to download and install this module on your device to use its functionalities.
Installation of TEXT::CSV :
For Windows:

perl -MCPAN -e shell
install Text::CSV

For a Debian/Ubuntu-based system:

$ sudo apt-get install libtext-csv-perl

For a RedHat/Centos/Fedora-based system:

$ sudo yum install perl-Text-CSV

Following is a code to be run on our new.csv file to escape the comma character within quotes:

filter_none

edit
close

play_arrow

link
brightness_4
code

use strict;
  
# Using Text::CSV file to allow
# full CSV Reader and Writer
use Text::CSV;
  
my $csv = Text::CSV->new({ sep_char => ', ' });
   
my $file_to_be_read = $ARGV[0] or die;
  
# Reading the file
open(my $data_file, '<', $file_to_be_read) or die;
while (my $line = <$data_file>) 
{
  chomp $line;
   
  # Parsing the line
  if ($csv->parse($line)) 
  {
        
      # Extracting elements
      my @words = $csv->fields();
      for (my $i = 0; $i <= 2; $i++) 
      {
          print "$words[$i] ";
      }
  
      print "\n";
  
  else 
  {
      # Warning to be displayed
      warn "Line could not be parsed: $line\n";
  }
}

chevron_right


Output:

In the above example, it can be seen that the first field now has a comma which has been escaped while parsing the CSV file.

my $csv = Text::CSV->new({ sep_char => ', ' }); 

separated by “, “.
Above line describes the way to call the constructor on the class. A constructor calling is done using the arrow ->.

$csv->parse($line)

This call will try to parse the current line and will split it up to pieces. Return true or false depending on success or failure.

Fields with embedded new-lines

In a CSV file, there can also be some fields that are multi-lined or having a new line embedded between the words. These kinds of multi-lined fields when passed through a split() function work very differently in comparison to other files with no embedded new line.
Example:

Perl provides a getline() method to handle such kind of files.

filter_none

edit
close

play_arrow

link
brightness_4
code

use strict;
  
# Using Text::CSV file to allow
# full CSV Reader and Writer
use Text::CSV;
  
my $file = $ARGV[0] or die;
  
my $csv = Text::CSV->new (
{
    binary => 1,
    auto_diag => 1,
    sep_char => ', '
});
  
my $sum = 0;
  
# Reading the file
open(my $data, '<:encoding(utf8)', $file) or die;
  
while (my $words = $csv->getline($data)) 
{
    for (my $i = 0; $i < 3; $i++) 
    {
        print "$words->[$i]";
    }
    print "\n";
}
  
# Checking for End-of-file
if (not $csv->eof
{
    $csv->error_diag();
}
close $data;

chevron_right


Output:

In the above CSV file, the embedded newline is now handled with the use of getline() method and Perl treats the new field as one, as required by the programmer and hence was put within quotes.



My Personal Notes arrow_drop_up

Developer

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.