Remove duplicate words from Sentence using Regular Expression
Input: str = “Good bye bye world world”
Output: Good bye world
We remove the second occurrence of bye and world from Good bye bye world world
Input: str = “Ram went went to to to his home”
Output: Ram went to his home
We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.
Input: str = “Hello hello world world”
Output: Hello world
We remove the second occurrence of hello and world from Hello hello world world.
- Get the sentence.
- Form a regular expression to remove duplicate words from sentences.
regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
- The details of the above regular expression can be understood as:
- “\\b”: A word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice.
- “\\w+” A word character: [a-zA-Z_0-9]
- “\\W+”: A non-word character: [^\w]
- “\\1”: Matches whatever was matched in the 1st group of parentheses, which in this case is the (\w+)
- “+”: Match whatever it’s placed after 1 or more times
- Match the sentence with the Regex. In Java, this can be done using Pattern.matcher().
- return the modified sentence.
Below is the implementation of the above approach:
Good bye world Ram went to his home Hello world
Attention reader! Don’t stop learning now. Get hold of all the important Java Foundation and Collections concepts with the Fundamentals of Java and Java Collections Course at a student-friendly price and become industry ready. To complete your preparation from learning a language to DS Algo and many more, please refer Complete Interview Preparation Course.