Reading Tabular Data from Files in Julia

Julia is a high level, high performance, dynamic programming language which allows users to load, save, and manipulate data in various types of files for data science, analysis, and machine learning purposes. Tabular data is data that has a structure of a table and it can be easily read from various files like text, CSV, Excel, etc. 

To perform such operations on data and files with ease, we add the Queryverse.jl package which provides us ease of use for other useful packages such as Query.jl, FileIO.jl, CSVFiles.jl, etc.

Julia

filter_none

edit
close

play_arrow

link
brightness_4
code

# Adding the Queryverse package 
using Pkg 
Pkg.add("Queryverse")

chevron_right


Reading Tabular Data from Text Files

To read data from a text file we have to open it first using the open() function. And to read the tabular data in the file we have to read data in the file line by line using readline() function as shown below:

Julia



filter_none

edit
close

play_arrow

link
brightness_4
code

# read file contents, line by line  
open("geek.txt") do f
  
  # line_number 
  line = 0
    
  # read till end of file 
  while ! eof(f)
    
    # read a new / next line for every iteration
    s = readline(f)
    line += 1
    println("$(line-1). $s")
  end
end

chevron_right


Reading Tabular Data from CSV Files

DataFrames are used to store data in a tabular form and these DataFrames can be read from CSV or Excel files by using the Queryverse.jl package and the load() function. Queryverse.jl package lets the FileIO.jl package use the CSVFiles.jl package to implement this.

Julia

filter_none

edit
close

play_arrow

link
brightness_4
code

# using necessary packages
using DataFrames, Queryverse
  
# reading dataframe
df = load("marks.csv") |> DataFrame

chevron_right


Sometimes in CSV files, data is separated by different characters like semicolons. 

The semicolon can be specified in the load() function to read data in normal tabular form, i.e. without the semicolons.

Julia



filter_none

edit
close

play_arrow

link
brightness_4
code

# reading data without semicolons
df = load("marks_sc.csv", ';') |> DataFrame

chevron_right


The column names of the DataFrame take up the first row of the file. To change this we can use the header keyword argument and equate it to false to remove the column names and change the first row into elements of the table in the file.

Julia

filter_none

edit
close

play_arrow

link
brightness_4
code

# reading data without headers
df = load("marks.csv"
           header_exists = false) |> DataFrame

chevron_right


While loading the data of the file, we can also change the column names using the colnames keyword as shown below:

Julia

filter_none

edit
close

play_arrow

link
brightness_4
code

# reading data by changing column names
df = load("marks.csv"
           colnames = ["class"
                          "score"]) |> DataFrame

chevron_right


Tabular data from a CSV file can be loaded without a specific number of rows using the skiplines_begin keyword.

Julia

filter_none

edit
close

play_arrow

link
brightness_4
code

# reading data without specific rows
df = load("marks.csv"
           skiplines_begin = 1) |> DataFrame

chevron_right


Reading Tabular Data from Excel Files

The process for reading data from excel sheets is the same as that of CSV files, which has been discussed above, but we have to specify a file with the extension ‘*.xlsx’ instead of a ‘.csv’ in the load() function and the specific sheet we want to read.

Julia

filter_none

edit
close

play_arrow

link
brightness_4
code

# reading sheet 1 of an excel file
df = load("marks.xlsx", "Sheet1") |> DataFrame

chevron_right


We can also read specific rows and columns of the data in an excel file using the skipstartrows and skipstartcols keywords which skip specified rows and columns as shown below:

Julia

filter_none

edit
close

play_arrow

link
brightness_4
code

# reading by skiping specific rows and columns
df = load("marks.xlsx", "Sheet1",
               skipstartrows = 1
               skipstartcols = 1) |> DataFrame

chevron_right





My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.