Twitter is one of the most popular social media platforms. The Twitter API provides the tools you need to contribute to, engage with, and analyze the conversation happening on Twitter, which finds a lot of application in fields like Data Analytics and Artificial Intelligence. This article focuses on how to extract tweets having a particular Hashtag starting from a given date.
Requirements:
- Tweepy is a Python package meant for easy accessing of the Twitter API. Almost all the functionality provided by Twitter API can be used through Tweepy. To install this type the below command in the terminal.
pip install Tweepy
- Pandas is a very powerful framework for data analysis in python. It is built on Numpy Package and its key data structure is a DataFrame where one can manipulate tabular data. To install this type the below command in the terminal.
pip install pandas
Prerequisites:
- Create a Twitter Developer account and obtain your consumer secret key and access token
- Install Tweepy and Pandas module on your system by running this command in Command Prompt
Step-by-step Approach:
- Import required modules.
- Create an explicit function to display tweet data.
- Create another function to scrape data regarding a given Hashtag using tweepy module.
- In the Driver Code assign Twitter Developer account credentials along with the Hashtag, initial date and number of tweets.
- Finally, call the function to scrape the data with Hashtag, initial date and number of tweets as argument.
Below is the complete program based on the above approach:
Python
# Python Script to Extract tweets of a # particular Hashtag using Tweepy and Pandas # import modules import pandas as pd
import tweepy
# function to display data of each tweet def printtweetdata(n, ith_tweet):
print ()
print (f "Tweet {n}:" )
print (f "Username:{ith_tweet[0]}" )
print (f "Description:{ith_tweet[1]}" )
print (f "Location:{ith_tweet[2]}" )
print (f "Following Count:{ith_tweet[3]}" )
print (f "Follower Count:{ith_tweet[4]}" )
print (f "Total Tweets:{ith_tweet[5]}" )
print (f "Retweet Count:{ith_tweet[6]}" )
print (f "Tweet Text:{ith_tweet[7]}" )
print (f "Hashtags Used:{ith_tweet[8]}" )
# function to perform data extraction def scrape(words, date_since, numtweet):
# Creating DataFrame using pandas
db = pd.DataFrame(columns = [ 'username' ,
'description' ,
'location' ,
'following' ,
'followers' ,
'totaltweets' ,
'retweetcount' ,
'text' ,
'hashtags' ])
# We are using .Cursor() to search
# through twitter for the required tweets.
# The number of tweets can be
# restricted using .items(number of tweets)
tweets = tweepy.Cursor(api.search_tweets,
words, lang = "en" ,
since_id = date_since,
tweet_mode = 'extended' ).items(numtweet)
# .Cursor() returns an iterable object. Each item in
# the iterator has various attributes
# that you can access to
# get information about each tweet
list_tweets =
# Counter to maintain Tweet Count
i = 1
# we will iterate over each tweet in the
# list for extracting information about each tweet
for tweet in list_tweets:
username = tweet.user.screen_name
description = tweet.user.description
location = tweet.user.location
following = tweet.user.friends_count
followers = tweet.user.followers_count
totaltweets = tweet.user.statuses_count
retweetcount = tweet.retweet_count
hashtags = tweet.entities[ 'hashtags' ]
# Retweets can be distinguished by
# a retweeted_status attribute,
# in case it is an invalid reference,
# except block will be executed
try :
text = tweet.retweeted_status.full_text
except AttributeError:
text = tweet.full_text
hashtext = list ()
for j in range ( 0 , len (hashtags)):
hashtext.append(hashtags[j][ 'text' ])
# Here we are appending all the
# extracted information in the DataFrame
ith_tweet = [username, description,
location, following,
followers, totaltweets,
retweetcount, text, hashtext]
db.loc[ len (db)] = ith_tweet
# Function call to print tweet data on screen
printtweetdata(i, ith_tweet)
i = i + 1
filename = 'scraped_tweets.csv'
# we will save our database as a CSV file.
db.to_csv(filename)
if __name__ = = '__main__' :
# Enter your own credentials obtained
# from your developer account
consumer_key = "XXXXXXXXXXXXXXXXXXXXX"
consumer_secret = "XXXXXXXXXXXXXXXXXXXXX"
access_key = "XXXXXXXXXXXXXXXXXXXXX"
access_secret = "XXXXXXXXXXXXXXXXXXXXX"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
# Enter Hashtag and initial date
print ( "Enter Twitter HashTag to search for" )
words = input ()
print ( "Enter Date since The Tweets are required in yyyy-mm--dd" )
date_since = input ()
# number of tweets you want to extract in one run
numtweet = 100
scrape(words, date_since, numtweet)
print ( 'Scraping has completed!' )
|
Output:
Demo:
Article Tags :