Open In App

How to convert tab-separated file into a dataframe using Python

In this article, we will learn how to convert a TSV file into a data frame using Python and the Pandas library.

A TSV (Tab-Separated Values) file is a plain text file where data is organized in rows and columns, with each column separated by a tab character.



Methods to Convert Tab-Separated File into a Data Frame

Method 1: Using pandas ‘read_csv()’ with ‘sep’ parameter

In this method, we will use Pandas library to read a tab-separated file (file.tsv) into a DataFrame.

Look at the following code snippet.






import pandas as pd
file_path = "file.tsv"
df = pd.read_csv(file_path,sep='\t')
df.head()

Output:

    0    50    5    881250949
0    0    172    5    881250949
1    0    133    1    881250949
2    196    242    3    881250949
3    186    302    3    891717742
4    22    377    1    878887116

Method 2: Using pandas ‘read_table()’ function

In the following code snippet, we have again used the pandas library in Python to read the contents of a tab-separated file named ‘file.tsv’ into a DataFrame named ‘df’. The pd.read_table() function is employed for this task, which automatically infers the tab separator.




import pandas as pd
df = pd.read_table('file.tsv')
df.head()

Output:

    0    50    5    881250949
0    0    172    5    881250949
1    0    133    1    881250949
2    196    242    3    881250949
3    186    302    3    891717742
4    22    377    1    878887116

Method 3: Using csv module

The code example, begin by importing the csv module, which provides functionality for reading and writing CSV files.




import csv
file_path = "file.tsv"
with open(file_path, 'r') as file:
    reader = csv.reader(file, delimiter='\t')
    df = pd.DataFrame(reader)
df.head()

Output:

    0    1    2    3
0    0    50    5    881250949
1    0    172    5    881250949
2    0    133    1    881250949
3    196    242    3    881250949
4    186    302    3    891717742

Method 4: Use ‘numpy’ to load the data and then convert to a DataFrame

This code segment employs NumPy’s ‘genfromtxt()’ function to import tab-separated data from ‘file.tsv’ into a NumPy array, configuring the tab delimiter and data type. Following this, it converts the NumPy array into a pandas DataFrame, facilitating structured data representation for further analysis and manipulation.




import numpy as np
import pandas as pd
data = np.genfromtxt('file.tsv', delimiter='\t', dtype=None, encoding=None)
df = pd.DataFrame(data)
df.head()

Output:

     0    1  2          3
0 0 50 5 881250949
1 0 172 5 881250949
2 0 133 1 881250949
3 196 242 3 881250949
4 186 302 3 891717742


Article Tags :