Reading Tabular Data from Files in Julia
Last Updated :
10 Jun, 2021
Julia is a high level, high performance, dynamic programming language which allows users to load, save, and manipulate data in various types of files for data science, analysis, and machine learning purposes. Tabular data is data that has a structure of a table and it can be easily read from various files like text, CSV, Excel, etc.
To perform such operations on data and files with ease, we add the Queryverse.jl package which provides us ease of use for other useful packages such as Query.jl, FileIO.jl, CSVFiles.jl, etc.
Julia
using Pkg
Pkg.add( "Queryverse" )
|
Reading Tabular Data from Text Files
To read data from a text file we have to open it first using the open() function. And to read the tabular data in the file we have to read data in the file line by line using readline() function as shown below:
Julia
open ( "geek.txt" ) do f
line = 0
while ! eof(f)
s = readline(f)
line + = 1
println( "$(line-1). $s" )
end
end
|
Reading Tabular Data from CSV Files
DataFrames are used to store data in a tabular form and these DataFrames can be read from CSV or Excel files by using the Queryverse.jl package and the load() function. Queryverse.jl package lets the FileIO.jl package use the CSVFiles.jl package to implement this.
Julia
using DataFrames, Queryverse
df = load( "marks.csv" ) |> DataFrame
|
Sometimes in CSV files, data is separated by different characters like semicolons.
The semicolon can be specified in the load() function to read data in normal tabular form, i.e. without the semicolons.
Julia
df = load( "marks_sc.csv" , ';' ) |> DataFrame
|
The column names of the DataFrame take up the first row of the file. To change this we can use the header keyword argument and equate it to false to remove the column names and change the first row into elements of the table in the file.
Julia
df = load( "marks.csv" ,
header_exists = false) |> DataFrame
|
While loading the data of the file, we can also change the column names using the colnames keyword as shown below:
Julia
df = load( "marks.csv" ,
colnames = [ "class" ,
"score" ]) |> DataFrame
|
Tabular data from a CSV file can be loaded without a specific number of rows using the skiplines_begin keyword.
Julia
df = load( "marks.csv" ,
skiplines_begin = 1 ) |> DataFrame
|
Reading Tabular Data from Excel Files
The process for reading data from excel sheets is the same as that of CSV files, which has been discussed above, but we have to specify a file with the extension ‘*.xlsx’ instead of a ‘.csv’ in the load() function and the specific sheet we want to read.
Julia
df = load( "marks.xlsx" , "Sheet1" ) |> DataFrame
|
We can also read specific rows and columns of the data in an excel file using the skipstartrows and skipstartcols keywords which skip specified rows and columns as shown below:
Julia
df = load( "marks.xlsx" , "Sheet1" ,
skipstartrows = 1 ,
skipstartcols = 1 ) |> DataFrame
|
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...