Remove URLs from string in Python
Last Updated :
24 Jan, 2024
A regular expression (regex) is a sequence of characters that defines a search pattern in text. To remove URLs from a string in Python, you can either use regular expressions (regex) or some external libraries like urllib.parse. The re-module in Python is used for working with regular expressions. In this article, we will see how we can remove URLs from a string in Python.
Python Remove URLs from a String
Below are the ways by which we can remove URLs from a string in Python:
- Using the re.sub() function
- Using the re.findall() function
- Using the re.search() function
- Using the urllib.parse class
Python Remove URLs from String Using re.sub() function
In this example, the code defines a function ‘remove_urls’ to find URLs in text and replace them with a placeholder [URL REMOVED], using regular expressions for pattern matching and the re.sub() method for substitution.
Python3
import re
def remove_urls(text, replacement_text = "[URL REMOVED]" ):
url_pattern = re. compile (r 'https?://\S+|www\.\S+' )
text_without_urls = url_pattern.sub(replacement_text, text)
return text_without_urls
output_text = remove_urls(input_text)
print ( "Original Text:" )
print (input_text)
print ( "\nText with URLs Removed:" )
print (output_text)
|
Output
Original Text:
Visit on GeeksforGeeks Website: https://www.geeksforgeeks.org/
Text with URLs Removed:
Visit on GeeksforGeeks Website: [URL REMOVED]
Remove URLs from String Using re.findall() function
In this example, the Python code defines a function ‘remove_urls_findall’ that uses regular expressions to find all URLs using re.findall() method in a given text and replaces them with a replacement text “[URL REMOVED]”.
Python3
import re
def remove_urls_findall(text, replacement_text = "[URL REMOVED]" ):
url_pattern = re. compile (r 'https?://\S+|www\.\S+' )
urls = url_pattern.findall(text)
for url in urls:
text = text.replace(url, replacement_text)
return text
output_text_findall = remove_urls_findall(input_text)
print ( "\nUsing re.findall():" )
print ( "Original Text:" )
print (input_text)
print ( "\nText with URLs Removed:" )
print (output_text_findall)
|
Output:
Using re.findall():
Original Text:
Check out the latest Python tutorials on GeeksforGeeks: https://www.geeksforgeeks.org/category/python/
Text with URLs Removed:
Check out the latest Python tutorials on GeeksforGeeks: [URL REMOVED]
Remove URLs from String in Python Using re.search() function
In this example, the Python code defines a function ‘remove_urls_search’ using regular expressions and re.search() to find and replace URLs in a given text with a replacement text “[URL REMOVED]”.
Python3
import re
def remove_urls_search(text, replacement_text = "[URL REMOVED]" ):
url_pattern = re. compile (r 'https?://\S+|www\.\S+' )
while True :
match = url_pattern.search(text)
if not match:
break
text = text[:match.start()] + replacement_text + text[match.end():]
return text
output_text_search = remove_urls_search(input_text)
print ( "\nUsing re.search():" )
print ( "Original Text:" )
print (input_text)
print ( "\nText with URLs Removed:" )
print (output_text_search)
|
Output:
Using re.search():
Original Text:
Visit our website at https://geeksforgeeks.org/ for more information. Follow us on Twitter: @geeksforgeeks
Text with URLs Removed:
Visit our website at [URL REMOVED] for more information. Follow us on Twitter: @geeksforgeeks
Remove URLs from String Using urllib.parse
In this example, the Python code defines a function ‘remove_urls_urllib’ that uses urllib.parse to check and replace URLs in a given text with a replacement text “[URL REMOVED]”.
Python3
from urllib.parse import urlparse
def remove_urls_urllib(text, replacement_text = "[URL REMOVED]" ):
words = text.split()
for i, word in enumerate (words):
parsed_url = urlparse(word)
if parsed_url.scheme and parsed_url.netloc:
words[i] = replacement_text
return ' ' .join(words)
output_text_urllib = remove_urls_urllib(input_text)
print ( "Using urllib.parse:" )
print ( "Text with URLs Removed:" )
print (output_text_urllib)
|
Output
Using urllib.parse:
Text with URLs Removed:
Check out the GeeksforGeeks website at [URL REMOVED] for programming tutorials.
Share your thoughts in the comments
Please Login to comment...