Extracting Tweets containing a particular Hashtag using Python

Twitter is one of the most popular social media platforms. The Twitter API provides the tools you need to contribute to, engage with, and analyze the conversation happening on Twitter, which finds a lot of application in fields like Data Analytics and Artificial Intelligence. This article focuses on how to extract tweets having a particular Hashtag starting from a given date.

Requirements:

Tweepy is a Python package meant for easy accessing of the Twitter API. Almost all the functionality provided by Twitter API can be used through Tweepy. To install this type the below command in the terminal.

pip install Tweepy

Pandas is a very powerful framework for data analysis in python. It is built on Numpy Package and its key data structure is a DataFrame where one can manipulate tabular data. To install this type the below command in the terminal.

pip install pandas

Prerequisites:

Create a Twitter Developer account and obtain your consumer secret key and access token
Install Tweepy and Pandas module on your system by running this command in Command Prompt

Step-by-step Approach:

Import required modules.
Create an explicit function to display tweet data.
Create another function to scrape data regarding a given Hashtag using tweepy module.
In the Driver Code assign Twitter Developer account credentials along with the Hashtag, initial date and number of tweets.
Finally, call the function to scrape the data with Hashtag, initial date and number of tweets as argument.

Below is the complete program based on the above approach:

Python

# Python Script to Extract tweets of a
# particular Hashtag using Tweepy and Pandas
 
# import modules

import pandas as pd

import tweepy
 
# function to display data of each tweet

def printtweetdata(n, ith_tweet):

        print()

        print(f"Tweet {n}:")

        print(f"Username:{ith_tweet[0]}")

        print(f"Description:{ith_tweet[1]}")

        print(f"Location:{ith_tweet[2]}")

        print(f"Following Count:{ith_tweet[3]}")

        print(f"Follower Count:{ith_tweet[4]}")

        print(f"Total Tweets:{ith_tweet[5]}")

        print(f"Retweet Count:{ith_tweet[6]}")

        print(f"Tweet Text:{ith_tweet[7]}")

        print(f"Hashtags Used:{ith_tweet[8]}")
 
# function to perform data extraction

def scrape(words, date_since, numtweet):
 
        # Creating DataFrame using pandas

        db = pd.DataFrame(columns=['username',

                                   'description',

                                   'location', 

                                   'following',

                                   'followers', 

                                   'totaltweets',

                                   'retweetcount', 

                                   'text',

                                   'hashtags'])
 
        # We are using .Cursor() to search

        # through twitter for the required tweets.

        # The number of tweets can be

        # restricted using .items(number of tweets)

        tweets = tweepy.Cursor(api.search_tweets, 

                               words, lang="en",

                               since_id=date_since, 

                               tweet_mode='extended').items(numtweet)
 
        # .Cursor() returns an iterable object. Each item in

        # the iterator has various attributes

        # that you can access to

        # get information about each tweet

        list_tweets = 
 
        # Counter to maintain Tweet Count

        i = 1
 
        # we will iterate over each tweet in the

        # list for extracting information about each tweet

        for tweet in list_tweets:

                username = tweet.user.screen_name

                description = tweet.user.description

                location = tweet.user.location

                following = tweet.user.friends_count

                followers = tweet.user.followers_count

                totaltweets = tweet.user.statuses_count

                retweetcount = tweet.retweet_count

                hashtags = tweet.entities['hashtags']
 
                # Retweets can be distinguished by

                # a retweeted_status attribute,

                # in case it is an invalid reference,

                # except block will be executed

                try:

                        text = tweet.retweeted_status.full_text

                except AttributeError:

                        text = tweet.full_text

                hashtext = list()

                for j in range(0, len(hashtags)):

                        hashtext.append(hashtags[j]['text'])
 
                # Here we are appending all the

                # extracted information in the DataFrame

                ith_tweet = [username, description, 

                             location, following,

                             followers, totaltweets, 

                             retweetcount, text, hashtext]

                db.loc[len(db)] = ith_tweet
 
                # Function call to print tweet data on screen

                printtweetdata(i, ith_tweet)

                i = i+1

        filename = 'scraped_tweets.csv'
 
        # we will save our database as a CSV file.

        db.to_csv(filename)
 
if __name__ == '__main__':
 
        # Enter your own credentials obtained

        # from your developer account

        consumer_key = "XXXXXXXXXXXXXXXXXXXXX"

        consumer_secret = "XXXXXXXXXXXXXXXXXXXXX"

        access_key = "XXXXXXXXXXXXXXXXXXXXX"

        access_secret = "XXXXXXXXXXXXXXXXXXXXX"
 
        auth = tweepy.OAuthHandler(consumer_key, consumer_secret)

        auth.set_access_token(access_key, access_secret)

        api = tweepy.API(auth)
 
        # Enter Hashtag and initial date

        print("Enter Twitter HashTag to search for")

        words = input()

        print("Enter Date since The Tweets are required in yyyy-mm--dd")

        date_since = input()
 
        # number of tweets you want to extract in one run

        numtweet = 100

        scrape(words, date_since, numtweet)

        print('Scraping has completed!')

Output:

Demo:

Article Tags :

Python

Python-Tweepy