Remove duplicate words from Sentence using Regular Expression

Given a string str which represents a sentence, the task is to remove the duplicate words from sentences using regular expression in java.

Examples:

Input: str = “Good bye bye world world”
Output: Good bye world
Explanation:
We remove the second occurrence of bye and world from Good bye bye world world



Input: str = “Ram went went to to to his home”
Output: Ram went to his home
Explanation:
We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.

Input: str = “Hello hello world world”
Output: Hello world
Explanation:
We remove the second occurrence of hello and world from Hello hello world world.

Approach

  1. Get the sentence.
  2. Form a regular expression to remove duplicate words from sentences.
    regex = "\\b(\\w+)(?:\\W+\\1\\b)+";

    The details of the above regular expression can be understood as:

    • “\\b”: A word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice.
    • “\\w+” A word character: [a-zA-Z_0-9]
    • “\\W+”: A non-word character: [^\w]
    • “\\1”: Matches whatever was matched in the 1st group of parentheses, which in this case is the (\w+)
    • “+”: Match whatever it’s placed after 1 or more times
  3. Match the sentence with the Regex. In Java, this can be done using Pattern.matcher().
  4. return the modified sentence.

Below is the implementation of the above approach:

filter_none

edit
close

play_arrow

link
brightness_4
code

// Java program to remove duplicate words
// using Regular Expression or ReGex.
  
import java.util.regex.Matcher;
import java.util.regex.Pattern;
  
class GFG {
  
    // Function to validate the sentence
    // and remove the duplicate words
    public static String
    removeDuplicateWords(String input)
    {
  
        // Regex to matching repeated words.
        String regex
            = "\\b(\\w+)(?:\\W+\\1\\b)+";
        Pattern p
            = Pattern.compile(
                regex,
                Pattern.CASE_INSENSITIVE);
  
        // Pattern class contains matcher() method
        // to find matching between given sentence
        // and regular expression.
        Matcher m = p.matcher(input);
  
        // Check for subsequences of input
        // that match the compiled pattern
        while (m.find()) {
            input
                = input.replaceAll(
                    m.group(),
                    m.group(1));
        }
        return input;
    }
  
    // Driver code
    public static void main(String args[])
    {
  
        // Test Case: 1
        String str1
            = "Good bye bye world world";
        System.out.println(
            removeDuplicateWords(str1));
  
        // Test Case: 2
        String str2
            = "Ram went went to to his home";
        System.out.println(
            removeDuplicateWords(str2));
  
        // Test Case: 3
        String str3
            = "Hello hello world world";
        System.out.println(
            removeDuplicateWords(str3));
    }
}

chevron_right


Output:

Good bye world
Ram went to his home
Hello world



My Personal Notes arrow_drop_up


If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.