Open In App

How to convert tab-separated file into a dataframe using Python

Last Updated : 27 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will learn how to convert a TSV file into a data frame using Python and the Pandas library.

A TSV (Tab-Separated Values) file is a plain text file where data is organized in rows and columns, with each column separated by a tab character.

  • It is a type of delimiter-separated file, similar to CSV (Comma-Separated Values).
  • Tab-separated files are commonly used in data manipulation and analysis, and being able to convert them into a data frame can greatly enhance our ability to work with structured data efficiently.

Methods to Convert Tab-Separated File into a Data Frame

Method 1: Using pandas ‘read_csv()’ with ‘sep’ parameter

In this method, we will use Pandas library to read a tab-separated file (file.tsv) into a DataFrame.

Look at the following code snippet.

  • We have imported the pandas library and defined the path of the tab-separated file.
  • Then, we use ‘pd.read_csv()’ function to read the contents of the tab-separated file into a DataFrame and specified that the file is tab-separated using “sep =’\t'”
  • The ‘read_csv()' function automatically detects the delimiter and parses the file accordingly.

Python




import pandas as pd
file_path = "file.tsv"
df = pd.read_csv(file_path,sep='\t')
df.head()


Output:

    0    50    5    881250949
0    0    172    5    881250949
1    0    133    1    881250949
2    196    242    3    881250949
3    186    302    3    891717742
4    22    377    1    878887116

Method 2: Using pandas ‘read_table()’ function

In the following code snippet, we have again used the pandas library in Python to read the contents of a tab-separated file named ‘file.tsv’ into a DataFrame named ‘df’. The pd.read_table() function is employed for this task, which automatically infers the tab separator.

Python




import pandas as pd
df = pd.read_table('file.tsv')
df.head()


Output:

    0    50    5    881250949
0    0    172    5    881250949
1    0    133    1    881250949
2    196    242    3    881250949
3    186    302    3    891717742
4    22    377    1    878887116

Method 3: Using csv module

The code example, begin by importing the csv module, which provides functionality for reading and writing CSV files.

  • Uses the open() function to open the file specified by file_path in read-only mode ('r'). Utilized the with statement to ensure proper file closure after reading.
  • Creates a CSV reader object using csv.reader(file, delimiter=’\t’), specifing that the values in the file are tab-separated.

Python




import csv
file_path = "file.tsv"
with open(file_path, 'r') as file:
    reader = csv.reader(file, delimiter='\t')
    df = pd.DataFrame(reader)
df.head()


Output:

    0    1    2    3
0    0    50    5    881250949
1    0    172    5    881250949
2    0    133    1    881250949
3    196    242    3    881250949
4    186    302    3    891717742

Method 4: Use ‘numpy’ to load the data and then convert to a DataFrame

This code segment employs NumPy’s ‘genfromtxt()’ function to import tab-separated data from ‘file.tsv’ into a NumPy array, configuring the tab delimiter and data type. Following this, it converts the NumPy array into a pandas DataFrame, facilitating structured data representation for further analysis and manipulation.

Python




import numpy as np
import pandas as pd
data = np.genfromtxt('file.tsv', delimiter='\t', dtype=None, encoding=None)
df = pd.DataFrame(data)
df.head()


Output:

     0    1  2          3
0 0 50 5 881250949
1 0 172 5 881250949
2 0 133 1 881250949
3 196 242 3 881250949
4 186 302 3 891717742



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads