Python – Extract hashtags from text
A hashtag is a keyword or phrase preceded by the hash symbol (#), written within a post or comment to highlight it and facilitate a search for it. Some examples are: #like, #gfg, #selfie
We are provided with a string containing hashtags, we have to extract these hashtags into a list and print them.
Examples:
Input : GeeksforGeeks is a wonderful #website for #ComputerScience
Output : website , ComputerScience
Input : This day is beautiful! #instagood #photooftheday #cute
Output : instagood, photooftheday, cute
Method 1:
- Split the text into words using the split() method.
- For every word check if the first character is a hash symbol(#) or not.
- If yes then add the word to the list of hashtags without the hash symbol.
- Print the list of hashtags.
Python3
def extract_hashtags(text):
hashtag_list = []
for word in text.split():
if word[ 0 ] = = '#' :
hashtag_list.append(word[ 1 :])
print ( "The hashtags in \"" + text + "\" are :" )
for hashtag in hashtag_list:
print (hashtag)
if __name__ = = "__main__" :
text1 = "GeeksforGeeks is a wonderful #website for #ComputerScience"
text2 = "This day is beautiful ! #instagood #photooftheday #cute"
extract_hashtags(text1)
extract_hashtags(text2)
|
Output
The hashtags in "GeeksforGeeks is a wonderful #website for #ComputerScience" are :
website
ComputerScience
The hashtags in "This day is beautiful ! #instagood #photooftheday #cute" are :
instagood
photooftheday
cute
Time complexity: O(n), where n is the number of words in the text.
Auxiliary space: O(n), where n is the number of hashtags in the text.
Method 2 : Using regular expressions.
Python3
import re
def extract_hashtags(text):
regex = "#(\w+)"
hashtag_list = re.findall(regex, text)
print ( "The hashtags in \"" + text + "\" are :" )
for hashtag in hashtag_list:
print (hashtag)
if __name__ = = "__main__" :
text1 = "GeeksforGeeks is a wonderful #website for #ComputerScience"
text2 = "This day is beautiful ! #instagood #photooftheday #cute"
extract_hashtags(text1)
extract_hashtags(text2)
|
Output
The hashtags in "GeeksforGeeks is a wonderful #website for #ComputerScience" are :
website
ComputerScience
The hashtags in "This day is beautiful ! #instagood #photooftheday #cute" are :
instagood
photooftheday
cute
Method 3 : Using startswith() and replace()
Python3
text1 = "GeeksforGeeks is a wonderful #website for #ComputerScience"
textList = text1.split()
for i in textList:
if (i.startswith( "#" )):
x = i.replace( "#" , '')
print (x)
|
Output
website
ComputerScience
Method 4 : Using replace()
Python3
text1 = "GeeksforGeeks is a wonderful #website for #ComputerScience"
textList = text1.split()
for i in textList:
if (i[ 0 ] = = "#" ):
x = i.replace( "#" , '')
print (x)
|
Output
website
ComputerScience
Method 5 : Using find() and replace() methods
Python3
text1 = "GeeksforGeeks is a wonderful #website for #ComputerScience"
textList = text1.split()
for i in textList:
if (i.find( "#" ) = = 0 ):
x = i.replace( "#" , '')
print (x)
|
Output
website
ComputerScience
Last Updated :
14 Feb, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...