Often, the data which is to be read and worked upon is already stored in a file but is present outside the R environment. Hence, importing data into R is a mandatory task in such circumstances. The formats which are supported by R are CSV, JSON, Excel, Text, XML, etc. The majority of times, the data to be read into R is in tabular format. The functions used for reading such data, which is stored in the form of rows and columns, import the data and return data frame in R. Data frame is preferred in R because it is easier to extract data from rows and columns of a data frame for statistical computation tasks than other data structures in R. The most common functions which are used for reading tabular data into R are:- read.table(), read.csv(), fromJSON() and read.xlxs() .
Reading Data from Text File
Functions used for reading tabular data from a text file is
- file: Specifies the name of the file.
- header:The header is a logical flag indicating whether the first line is a header line contains data or not.
- nrows: Specifies number of rows in the dataset.
- skip: Helps in skipping of lines from the beginning.
- colClasses: It is a character vector which indicates class of each column of the data set.
- sep: It a string indicating the way the columns are separated that is by commas, spaces, colons, tabs etc.
For small or moderately sized data sets, we can call read.table() without any arguments. R will automatically figure out the number of rows, the number of columns, classes of different columns, skip lines that start with #(comment symbol), etc. If we do specify the arguments, it will make the execution faster and efficient but here, since the dataset is small so it would not make much of a difference as it is already fast and efficient.
Let there be a tabular data file GeeksforGeeks.txt saved in the current directory with data as follows:
Reading Data from a CSV File
read.csv() function is used to read .csv files which is a very common format followed by spreadsheet applications like Microsoft Excel. read.csv() is similar to read.table() except that the default separator for the read.csv() function is the comma, whereas the default separator for read.table() is the space. The other thing that read.csv() specifies is that it always specified header to be equal to true.
While reading huge files, a point to keep in mind is to calculate the approximate memory required for it to store the dataset. It is to ensure that it is not more than the RAM available in the device we are working in. The memory required can be calculated as shown below:
Consider the table has 2000000 rows and 200 columns considering if all the columns are
of class numeric. 2000000 x 200 x 8 bytes/numeric #each number requires 8 bytes to be stored =3200000000/ bytes/MB =3051.76/MB =2.98 GB Approximately twice this amount of RAM i.e.5.96 GB will be required.
Another measure that can be followed to reduce time consumption is including the colClasses argument so that R does not have to search the class of each column of the dataset. If the number of rows i.e. nrows are specified beforehand then it will be helpful in reducing memory usage.
Let there be a tabular data file GeeksforGeeks.csv saved in the current directory with data as follows:
Reading data from JSON File
fromJSON() function is used to convert JSON data to R objects. This function requires installation of rjson package. This can be done by the following command:
Let there be a .json file GeeksforGeeks.json saved in the current directory with content as follows:
Reading Excel Sheets
The R function
read.xlsx() is used to read the contents of an Excel worksheet into a R data frame. This function requires installation of xlsx package. This can be done by using the following command:
Let there be an Excel file gfg.xlsx saved in the current directory with content as follows:
Here, we can observe that the
read.xlsx() has returned a data.frame as output.
But if large data sets (with more than 100000 cells) are to be read, then using read.xlsx2() is preferred. read.xlsx2() works faster on large files than read.xlsx().The result of read.xlsx2() will be different from read.xlsx(), because internally read.xlsx2() uses readColumns() which is tailored for tabular data.
- Writing Data in Tabular form to Files in Julia
- Reading Files in R Programming
- Create a Tabular representation of Data in R Programming - table() Function
- Perl | Reading Excel Files
- Working with XML Files in R Programming
- Working with CSV files in R Programming
- Writing to Files in R Programming
- Working with JSON Files in R Programming
- Working with Binary Files in R Programming
- Working with Excel Files in R Programming
- Reading contents of a Text File in R Programming - read.table() Function
- Modify Data of a Data Frame with an Expression in R Programming - with() Function
- Data Wrangling in R Programming - Data Transformation
- Convert a Data Frame into a Numeric Matrix in R Programming - data.matrix() Function
- Convert an Object to Data Frame in R Programming - as.data.frame() Function
- Check if the Object is a Data Frame in R Programming - is.data.frame() Function
- Generate a set of Sample data from a Data set in R Programming - sample() Function
- Data Handling in R Programming
- Data Reshaping in R Programming
- Data Munging in R Programming
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.