Convert pair to value using map() in Pyspark

Last Updated : 05 Feb, 2023

In this article, we are going to learn how to use map() to convert (key, value) pair to value and keys only using Pyspark in Python.

PySpark is the Python library for Spark programming. It is an API for interacting with the Spark cluster using the Python programming language. PySpark provides a simple and easy-to-use API for distributed data processing, machine learning, and graph processing using the power of Apache Spark.

map() function

The map() function is one of the core operations in PySpark. It is a transformation operation that applies a function to each element of an RDD (Resilient Distributed Dataset) and returns a new RDD containing the results. The function passed as an argument to the map() function takes a single argument, which is an element of the RDD, and returns a new element. The map() function is a transformation operation that applies a function to each element of an RDD (Resilient Distributed Dataset) and returns a new RDD containing the results.

Example 1

In this example, we are going to convert key-value pairs into values only. Firstly importing the required module after that store the key-value pairs in the variable kv_rdd with the elements (“a”,1) (“b”,2), and (“c”,3), and then the map() function is applied to kv_rdd RDD, which is a key-value pair RDD using the lambda function passed as an argument to map() takes a single argument x, which is a key-value pair, and returns only the value x[1]. This creates a new RDD containing only the values of the original RDD. Then the collect() method is used to retrieve the values of the new RDD and store them in the variable “values”.

Python3

# Import required module 
from pyspark import SparkContext 
  
sc = SparkContext() 
  
# Create a key-value pair RDD 
kv_rdd = sc.parallelize([(1, 'a'), 
                         (2, 'b'), 
                         (3, 'c')]) 
  
# Use map() to convert the RDD to an 
# RDD containing only the values 
value_rdd = kv_rdd.map(lambda x: x[1]) 
  
# Collect the values and print them 
values = value_rdd.collect() 
  
# Print values 
print(values)

Output:

['a', 'b', 'c']

Example 2

In this example, we are going to convert key-value pairs to keys only. To do so we have to follow the same procedure same as in the first example but in the map() function we have to iterate over the keys of the key-value pair using the lambda expression.

Python3

# Import required module 
from pyspark import SparkContext 
  
sc = SparkContext() 
  
# Create a key-value pair RDD 
kv_rdd = sc.parallelize([(1, 'a'), 
                         (2, 'b'), 
                         (3, 'c')]) 
  
# Use map() to convert the RDD to an 
# RDD containing keys and values 
keys_rdd = kv_rdd.map(lambda x:x[0]) 
  
# Collect the keys and values and print them 
keys = keys_rdd.collect() 
  
print(keys)

Output:

[1, 2, 3]

Example 3

In this example, we are going to convert the key-value pair into keys and values as a single entity. To perform this task the lambda function passed as an argument to map() takes a single argument x, which is a key-value pair, and returns the key value too. We store the keys and values separately in the list with the help of list comprehension using collect() function.

Python3

# Import required module 
from pyspark import SparkContext 
  
sc = SparkContext() 
  
# Create a key-value pair RDD 
kv_rdd = sc.parallelize([(1, 'a'), 
                         (2, 'b'), 
                         (3, 'c')]) 
  
# Use map() to convert the RDD to an  
# RDD containing keys and values 
value_rdd = kv_rdd.map(lambda x:x) 
  
# Collect the keys and values and print them 
values =[item for t in value_rdd.collect() for item in t]  
  
print(values)

Output:

[1, 'a', 2, 'b', 3, 'c']

Suggest improvement

Convert Python Functions into PySpark UDF

Share your thoughts in the comments

Convert pair to value using map() in Pyspark

map() function

Example 1

Python3

Example 2

Python3

Example 3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?