Open In App

Remove URLs from string in Python

Last Updated : 24 Jan, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

A regular expression (regex) is a sequence of characters that defines a search pattern in text. To remove URLs from a string in Python, you can either use regular expressions (regex) or some external libraries like urllib.parse. The re-module in Python is used for working with regular expressions. In this article, we will see how we can remove URLs from a string in Python.

Python Remove URLs from a String

Below are the ways by which we can remove URLs from a string in Python:

  • Using the re.sub() function
  • Using the re.findall() function
  • Using the re.search() function
  • Using the urllib.parse class

Python Remove URLs from String Using re.sub() function

In this example, the code defines a function ‘remove_urls’ to find URLs in text and replace them with a placeholder [URL REMOVED], using regular expressions for pattern matching and the re.sub() method for substitution.

Python3




import re
def remove_urls(text, replacement_text="[URL REMOVED]"):
    # Define a regex pattern to match URLs
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
 
    # Use the sub() method to replace URLs with the specified replacement text
    text_without_urls = url_pattern.sub(replacement_text, text)
 
    return text_without_urls
 
# Example:
input_text = "Visit on GeeksforGeeks Website: https://www.geeksforgeeks.org/"
output_text = remove_urls(input_text)
 
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text)


Output

Original Text:
Visit on GeeksforGeeks Website: https://www.geeksforgeeks.org/

Text with URLs Removed:
Visit on GeeksforGeeks Website: [URL REMOVED]

Remove URLs from String Using re.findall() function

In this example, the Python code defines a function ‘remove_urls_findall’ that uses regular expressions to find all URLs using re.findall() method in a given text and replaces them with a replacement text “[URL REMOVED]”.

Python3




import re
def remove_urls_findall(text, replacement_text="[URL REMOVED]"):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    urls = url_pattern.findall(text)
 
    for url in urls:
        text = text.replace(url, replacement_text)
 
    return text
 
# Example:
input_text = "Check out the latest Python tutorials on GeeksforGeeks: https://www.geeksforgeeks.org/category/python/"
output_text_findall = remove_urls_findall(input_text)
 
print("\nUsing re.findall():")
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text_findall)


Output:

Using re.findall():
Original Text:
Check out the latest Python tutorials on GeeksforGeeks: https://www.geeksforgeeks.org/category/python/
Text with URLs Removed:
Check out the latest Python tutorials on GeeksforGeeks: [URL REMOVED]

Remove URLs from String in Python Using re.search() function

In this example, the Python code defines a function ‘remove_urls_search’ using regular expressions and re.search() to find and replace URLs in a given text with a replacement text “[URL REMOVED]”.

Python3




import re
def remove_urls_search(text, replacement_text="[URL REMOVED]"):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
 
    while True:
        match = url_pattern.search(text)
        if not match:
            break
        text = text[:match.start()] + replacement_text + text[match.end():]
 
    return text
 
# Example:
input_text = "Visit our website at https://geeksforgeeks.org/ for more information. Follow us on Twitter: @geeksforgeeks"
output_text_search = remove_urls_search(input_text)
 
print("\nUsing re.search():")
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text_search)


Output:

Using re.search():
Original Text:
Visit our website at https://geeksforgeeks.org/ for more information. Follow us on Twitter: @geeksforgeeks
Text with URLs Removed:
Visit our website at [URL REMOVED] for more information. Follow us on Twitter: @geeksforgeeks

Remove URLs from String Using urllib.parse

In this example, the Python code defines a function ‘remove_urls_urllib’ that uses urllib.parse to check and replace URLs in a given text with a replacement text “[URL REMOVED]”.

Python3




# Using urllib.parse
from urllib.parse import urlparse
 
def remove_urls_urllib(text, replacement_text="[URL REMOVED]"):
    words = text.split()
    for i, word in enumerate(words):
        parsed_url = urlparse(word)
        if parsed_url.scheme and parsed_url.netloc:
            words[i] = replacement_text
    return ' '.join(words)
 
# Example:
input_text = "Check out the GeeksforGeeks website at https://www.geeksforgeeks.org/ for programming tutorials."
output_text_urllib = remove_urls_urllib(input_text)
 
print("Using urllib.parse:")
print("Text with URLs Removed:")
print(output_text_urllib)


Output

Using urllib.parse:
Text with URLs Removed:
Check out the GeeksforGeeks website at [URL REMOVED] for programming tutorials.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads