Extract URLs present in a given string
Last Updated :
07 Nov, 2023
Given a string S, the task is to find and extract all the URLs from the string. If no URL is present in the string, then print “-1”.
Examples:
Input: S = “Welcome to https://www.geeksforgeeks.org Computer Science Portal”
Output: https://www.geeksforgeeks.org
Explanation:
The given string contains the URL ‘https://www.geeksforgeeks.org’.
Input: S = “Welcome to https://write.geeksforgeeks.org portal of https://www.geeksforgeeks.org Computer Science Portal”
Output:
https://write.geeksforgeeks.org
https://www.geeksforgeeks.org
Explanation:
The given string contains two URLs ‘https://write.geeksforgeeks.org’ and ‘https://www.geeksforgeeks.org’.
Approach: The idea is to use Regular Expression to solve this problem. Follow the steps below to solve the given problem:
regex = “\\b((?:https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:, .;]*[-a-zA-Z0-9+&@#/%=~_|])”
- Create an ArrayList in Java and compile the regular expression using Pattern.compile().
- Match the given string with the regular expression. In Java, this can be done by using Pattern.matcher().
- Find the substring from the first index of match result to the last index of the match result and add this substring into the list.
- After completing the above steps, if the list is found to be empty, then print “-1” as there is no URL present in the string S. Otherwise, print all the string stored in the list.
Below is the implementation of the above approach:
C++
#include <iostream>
#include <regex>
#include <vector>
using namespace std;
void extractURL(string str)
{
vector<string> url_list;
string regex_str = "\\b((?:https?|ftp|file):"
"\\/\\/[a-zA-Z0-9+&@#\\/%?=~_|!:,.;]*"
"[a-zA-Z0-9+&@#\\/%=~_|])" ;
regex r(regex_str, regex_constants::icase);
sregex_iterator m(str.begin(), str.end(), r);
sregex_iterator m_end;
while (m != m_end) {
url_list.push_back(m->str());
m++;
}
if (url_list.size() == 0) {
cout << "-1" << endl;
} else {
for (string url : url_list) {
cout << url << endl;
}
}
}
int main()
{
extractURL(str);
return 0;
}
|
Java
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ExtractURL {
public static void extractURL(String str) {
ArrayList<String> urlList = new ArrayList<>();
String regexStr = "\\b((?:https?|ftp|file):"
+ "\\/\\/[a-zA-Z0-9+&@#\\/%?=~_|!:,.;]*"
+ "[a-zA-Z0-9+&@#\\/%=~_|])" ;
Pattern pattern = Pattern.compile(regexStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
urlList.add(matcher.group());
}
if (urlList.isEmpty()) {
System.out.println( "-1" );
} else {
for (String url : urlList) {
System.out.println(url);
}
}
}
public static void main(String[] args) {
+ "Computer Science Portal" ;
extractURL(str);
}
}
|
Python3
import re
def extractURL( str ):
url_list = []
regex = r '\b((?:https?|ftp|file):\/\/[-a-zA-Z0-9+&@#\/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#\/%=~_|])'
p = re. compile (regex, re.IGNORECASE)
m = p.finditer( str )
for match in m:
url_list.append( str [match.start():match.end()])
if len (url_list) = = 0 :
print ( "-1" )
return
for url in url_list:
print (url)
if __name__ = = '__main__' :
extractURL(string)
|
C#
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class Program
{
static void ExtractURL( string str)
{
List< string > urlList = new List< string >();
string regexStr = @"\b((https?|ftp|file)://[a-zA-Z0-9+&@#/%?=~_|!:,.;]*[a-zA-Z0-9+&@#/%=~_|])" ;
Regex regex = new Regex(regexStr, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(str);
foreach (Match match in matches)
{
urlList.Add(match.Value);
}
if (urlList.Count == 0)
{
Console.WriteLine( "-1" );
return ;
}
foreach ( string url in urlList)
{
Console.WriteLine(url);
}
}
static void Main()
{
ExtractURL(str);
}
}
|
Javascript
function extractURL(str) {
let urlList = [];
const regexStr = "\\b((?:https?|ftp|file):"
+ "\\/\\/[a-zA-Z0-9+&@#\\/%?=~_|!:,.;]*"
+ "[a-zA-Z0-9+&@#\\/%=~_|])" ;
const regex = new RegExp(regexStr, 'gi' );
let match;
while ((match = regex.exec(str)) !== null ) {
urlList.push(match[0]);
}
if (urlList.length === 0) {
console.log( "-1" );
return ;
}
for (let url of urlList) {
console.log(url);
}
}
extractURL(str);
|
Output
https://www.geeksforgeeks.org
Time Complexity: O(N)
Auxiliary Space: O(1)
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...