Open In App

R Read Text File to DataFrame

Last Updated : 26 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In today’s data-driven world, collecting data from multiple sources and turning it into a structured manner is a critical responsibility for data analysts and scientists. Text files are a prominent source of data, as they frequently include useful information in plain text format. To be used successfully, this data must be translated into a structured format, such as a DataFrame, which is a two-dimensional, size-mutable, heterogeneous tabular data structure with labeled axes.

Reading text files in R

Reading text files in R Programming Language is the process of taking data from plain text files and transforming it into a structured format that is easy to edit and analyze. Here are the types of text files available.

1. CSV (Comma-Separated Values)

  • CSV files use commas to separate values in each row.
  • Example: data.csv

2. TSV (Tab-Separated Values):

  • TSV files use tabs as separators between values.
  • Example: data.tsv

3. Space-Separated Values:

  • Space-separated files use spaces to separate values in each row.
  • Example: data.txt

4. Fixed-Width Files:

  • Fixed-width files have columns aligned at specific positions, with no delimiters.
  • Example: data.dat

Common Functions for Reading Text Files

There are three main methods :

  1. Using read.csv() function
  2. Using read.delim() function
  3. Using read.table() function

Let’s take an example that you have a data frame df with student information loaded into a csv file.

The data contains three columns: “Name”, “Roll No”, and “Marks”.

1. Using read.csv() function

CSV files are commonly used to store tabular data. Here’s how to read CSV files into a DataFrame using R:

  • Use the read.csv() method with the proper options, such as the file location and delimiter.
  • Assign the results to a DataFrame variable.

For import your dataset you can take any dataset and replace the path in code.

R
# Read the CSV file into a data frame
df <- read.csv('C:\\Users\\GFG19565\\Downloads\\heart.csv')
# Print the contents of the data frame
head(df)

Output:

  age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
1  52   1  0      125  212   0       1     168     0     1.0     2  2    3      0
2  53   1  0      140  203   1       0     155     1     3.1     0  0    3      0
3  70   1  0      145  174   0       1     125     1     2.6     0  0    3      0
4  61   1  0      148  203   0       1     161     0     0.0     2  1    3      0
5  62   0  0      138  294   1       1     106     0     1.9     1  3    2      0
6  58   0  0      100  248   0       0     122     0     1.0     1  0    2      1

2.Using read.delim() function

The read.delim() method reads data from the file “data.tsv”. Values in TSV files are separated by tabs, and this function defaults to using the tab (\t) delimiter.

R
# Read the tsv file into a data frame
df <- read.delim('C:\\Users\\GFG19565\\Downloads\\heart.csv')
# Print the contents of the data frame
head(df)

Output:


  age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
1 52 1 0 125 212 0 1 168 0 1.0 2 2 3 0
2 53 1 0 140 203 1 0 155 1 3.1 0 0 3 0
3 70 1 0 145 174 0 1 125 1 2.6 0 0 3 0
4 61 1 0 148 203 0 1 161 0 0.0 2 1 3 0
5 62 0 0 138 294 1 1 106 0 1.9 1 3 2 0
6 58 0 0 100 248 0 0 122 0 1.0 1 0 2 1

3. Using read.table() function

Tabular files store data in rows and columns. How to read tabular files into a DataFrame in R:

  • Use the read.table() function with appropriate parameters
  • Copy the file path from the Students.txt file and paste it into the df data frame and then print the contents of the data frame.
R
# Read data from the text file
df <- read.table('C:\\Users\\GFG19565\\Downloads\\heart.csv', sep='\t', header=TRUE)

# Display the data frame
head(df)

Output:

  age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
1 52 1 0 125 212 0 1 168 0 1.0 2 2 3 0
2 53 1 0 140 203 1 0 155 1 3.1 0 0 3 0
3 70 1 0 145 174 0 1 125 1 2.6 0 0 3 0
4 61 1 0 148 203 0 1 161 0 0.0 2 1 3 0
5 62 0 0 138 294 1 1 106 0 1.9 1 3 2 0
6 58 0 0 100 248 0 0 122 0 1.0 1 0 2 1

Customizing the Reading Process

  1. sep: Sets the separating character for reading.table().
  2. header: Determines if the file has a header row.
  3. na.strings: Specifies which strings should be treated as missing values.
  4. quote: Sets the quoting character for values that contain separators.
  5. Fill: Determines whether missing values should be filled with NA.

Handling Variations in Text Files

1. Missing Values

  • Use the na.strings argument to define which strings should be handled as missing values.
  • Example: read.csv(“data.csv”, na.strings = c(“”, “NA”).

2. Different Separators

  • Specify the separator with the sep option in read.table().
  • Example: read.table(“data.txt”, sep = “”)

3 .Inconsistent Data

  • Use the quote argument to define the quoting character for values that contain separators.
  • Example: read.csv(“data.csv”, quote = ‘”‘).

Conclusion

Reading text files into a DataFrame in R is an important step in the data analysis process. Analysts can efficiently extract, modify, and analyse data from a variety of sources using R functions and packages. Understanding various text file reading methods and proper data management procedures guarantees that R analysis findings are reliable and meaningful.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads