Skip to content
Related Articles

Related Articles

Reading Tabular Data from Files in Julia

View Discussion
Improve Article
Save Article
  • Last Updated : 10 Jun, 2021

Julia is a high level, high performance, dynamic programming language which allows users to load, save, and manipulate data in various types of files for data science, analysis, and machine learning purposes. Tabular data is data that has a structure of a table and it can be easily read from various files like text, CSV, Excel, etc. 

To perform such operations on data and files with ease, we add the Queryverse.jl package which provides us ease of use for other useful packages such as Query.jl, FileIO.jl, CSVFiles.jl, etc.


# Adding the Queryverse package
using Pkg


Reading Tabular Data from Text Files


To read data from a text file we have to open it first using the open() function. And to read the tabular data in the file we have to read data in the file line by line using readline() function as shown below:



# read file contents, line by line 
open("geek.txt") do f
  # line_number
  line = 0
  # read till end of file
  while ! eof(f)
    # read a new / next line for every iteration
    s = readline(f)
    line += 1
    println("$(line-1). $s")


Reading Tabular Data from CSV Files


DataFrames are used to store data in a tabular form and these DataFrames can be read from CSV or Excel files by using the Queryverse.jl package and the load() function. Queryverse.jl package lets the FileIO.jl package use the CSVFiles.jl package to implement this.



# using necessary packages
using DataFrames, Queryverse
# reading dataframe
df = load("marks.csv") |> DataFrame



Sometimes in CSV files, data is separated by different characters like semicolons. 



The semicolon can be specified in the load() function to read data in normal tabular form, i.e. without the semicolons.



# reading data without semicolons
df = load("marks_sc.csv", ';') |> DataFrame



The column names of the DataFrame take up the first row of the file. To change this we can use the header keyword argument and equate it to false to remove the column names and change the first row into elements of the table in the file.



# reading data without headers
df = load("marks.csv",
           header_exists = false) |> DataFrame



While loading the data of the file, we can also change the column names using the colnames keyword as shown below:



# reading data by changing column names
df = load("marks.csv",
           colnames = ["class",
                          "score"]) |> DataFrame



Tabular data from a CSV file can be loaded without a specific number of rows using the skiplines_begin keyword.



# reading data without specific rows
df = load("marks.csv",
           skiplines_begin = 1) |> DataFrame


Reading Tabular Data from Excel Files


The process for reading data from excel sheets is the same as that of CSV files, which has been discussed above, but we have to specify a file with the extension ‘*.xlsx’ instead of a ‘.csv’ in the load() function and the specific sheet we want to read.



# reading sheet 1 of an excel file
df = load("marks.xlsx", "Sheet1") |> DataFrame



We can also read specific rows and columns of the data in an excel file using the skipstartrows and skipstartcols keywords which skip specified rows and columns as shown below:



# reading by skipping specific rows and columns
df = load("marks.xlsx", "Sheet1",
               skipstartrows = 1,
               skipstartcols = 1) |> DataFrame

My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!