Open In App

Perl | Reading a CSV File

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

Perl was originally developed for the text processing like extracting the required information from a specified text file and for converting the text file into a different form. Reading a text file is a very common task in Perl. For example, you often come across reading CSV(Comma-Separated Value) files to extract data and information.

A CSV file can be created with the use of any text editor such as notepad, notepad++, etc. After adding content to a text file in the notepad, store it as a csv file with the use of .csv extension. 

Example of a CSV File:  

Store the above file as new.csv

A CSV file can be used to manage record files of databases of a business or a company. These files can be easily opened in Excel and can be manipulated with the use of any suitable software. Perl also supports the manipulation and creation of these ‘csv’ files by extracting values from the file, manipulating these values and restoring them into the file. For Extracting every value from a specific line, we are going to use the split function. 

Use of Split() for data extraction

split() is a predefined function in Perl which is used to separate a string into parts with the help of a delimiter. This delimiter can be any character as per user’s requirement, but generally, we take comma as a delimiter. 
split() takes two parameters. The first is a delimiter, the second is the string that needs to be split.

Syntax: split(_delimiter_, _string_);
Parameter: 
_delimiter_ : Separator value between elements 
_string_: From which values are to be extracted
Returns: an Array of string elements separated by _delimiter_ 
 

Example:  

Input: $s = "Johny loves Sugar" 
Output: "Johny", "loves", "Sugar"
If Input string is passed to split function as,
@words = split("", $s);
The array @words will be filled with 3 values: “Johny”, “loves” and “Sugar”.

Note: 

If $words[2] is printed then result will be "Sugar" as array indexing starts from 0.

Following steps are followed to split lines of a CSV file into parts using a delimiter:
Step 1: Read in the file line by line. 
Step 2: For each line, store all value in an array. 
Step 3: Print out all the values one by one to get the result

Let’s get to an example to get a better understanding of the topic. Following is a code for split() function to separate the strings stored in the new.csv file with the use of a delimiter: 

Perl




use strict;
   
my $file = $ARGV[0] or die;
open(my $data, '<', $file) or die;
   
while (my $line = <$data>) 
{
    chomp $line;
   
    # Split the line and store it
    # inside the words array
    my @words = split ", ", $line;  
   
    for (my $i = 0; $i <= 2; $i++)
    {
        print "$words[$i] ";
    }
    print "\n";
}


Save the above code in a text file with a .pl extension. Here, we are going to save it as test.pl

Execute the above-saved file with the use of the following command: 

perl test.pl new.csv

Output:  

Escaping a comma character

Sometimes there might be a file that has a comma within the fields of a string that if removed will change the meaning of the data or make the record useless. In such a situation if a split() function is used, even if within quotes, then it will separate the values each time it gets a comma as a delimiter, because split() function does not care about the quotes, nor does it understand anything about CSV. It just cuts where it finds the separator character.

Following is a CSV file which has a comma within the quotes:  

In the above CSV file, it can be seen that the first field has a comma within itself, hence closed within quotes. But if we run the split() function on this file then it won’t care for any such quotes. Following is the result of applying split() function on such a file: 

In the above file, split() function divided the string field into parts even if it was within quotes, also since, we were printing only three fields in our code, hence, the third field of the last string is dropped in the output file.

To handle such situations, some restrictions and scopes are added to Perl, these restrictions allow the compiler to skip the division of fields within quotes. 
We use the TEXT::CSV which allows full CSV reader and Writer. TEXT::CSV is a module of MCPAN in Perl, which allows many new functionalities such as reading, parsing, and writing CSV files. These modules can be included in the Perl program with the use of the following pragma: 

use Text::CSV

But first, there is a need to download and install this module on your device to use its functionalities. 

Installation of TEXT::CSV : 
For Windows: 

perl -MCPAN -e shell
install Text::CSV

For a Debian/Ubuntu-based system:  

$ sudo apt-get install libtext-csv-perl

For a RedHat/Centos/Fedora-based system: 

$ sudo yum install perl-Text-CSV

Following is a code to be run on our new.csv file to escape the comma character within quotes:  

Perl




use strict;
   
# Using Text::CSV file to allow
# full CSV Reader and Writer
use Text::CSV;
   
my $csv = Text::CSV->new({ sep_char => ', ' });
    
my $file_to_be_read = $ARGV[0] or die;
   
# Reading the file
open(my $data_file, '<', $file_to_be_read) or die;
while (my $line = <$data_file>) 
{
  chomp $line;
    
  # Parsing the line
  if ($csv->parse($line)) 
  {
         
      # Extracting elements
      my @words = $csv->fields();
      for (my $i = 0; $i <= 2; $i++) 
      {
          print "$words[$i] ";
      }
   
      print "\n";
  
  else
  {
      # Warning to be displayed
      warn "Line could not be parsed: $line\n";
  }
}


Output:

In the above example, it can be seen that the first field now has a comma which has been escaped while parsing the CSV file. 

my $csv = Text::CSV->new({ sep_char => ', ' }); 

separated by “, “. 
Above line describes the way to call the constructor on the class. A constructor calling is done using the arrow ->.

$csv->parse($line)

This call will try to parse the current line and will split it up to pieces. Return true or false depending on success or failure.
 

Fields with embedded new-lines

In a CSV file, there can also be some fields that are multi-lined or having a new line embedded between the words. These kinds of multi-lined fields when passed through a split() function work very differently in comparison to other files with no embedded new line. 
Example: 
 

Perl provides a getline() method to handle such kind of files. 

Perl




use strict;
   
# Using Text::CSV file to allow
# full CSV Reader and Writer
use Text::CSV;
   
my $file = $ARGV[0] or die;
   
my $csv = Text::CSV->new (
{
    binary => 1,
    auto_diag => 1,
    sep_char => ', '
});
   
my $sum = 0;
   
# Reading the file
open(my $data, '<:encoding(utf8)', $file) or die;
   
while (my $words = $csv->getline($data)) 
{
    for (my $i = 0; $i < 3; $i++) 
    {
        print "$words->[$i]";
    }
    print "\n";
}
   
# Checking for End-of-file
if (not $csv->eof
{
    $csv->error_diag();
}
close $data;


Output: 

In the above CSV file, the embedded newline is now handled with the use of getline() method and Perl treats the new field as one, as required by the programmer, and hence was put within quotes.
 



Last Updated : 14 Dec, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads