HTML Entity Parser

Given a string str which has various HTML Entities in it, the task is to replace these entities with their corresponding special character.

HTML entity parser is the parser that takes HTML code as input and replaces all the entities of the special characters by the characters itself. The special characters and their entities for HTML are Quotation Mark: the entity is &quot, and symbol character is “.

Below is the HTML Entities with their corresponding special characters is shown in the table below:

Name/ Description HTML Entity Special Character
Space    
Ampersand & &
Greater than > >
Less than &lt; <
Single Quotation Mark &apos; '
Double Quotation Mark &quot; "
Trademark &reg; ®
Copyright mark &copy; ©
Forward Slash &frasl;

Examples:

Input: str = “17 &gt; 25 and 25 &lt; 17”
Output: 17 > 25 and 25 < 17
Explanation: In the above example &gt; is
replaced by corresponding special character
> and &lt; is replaced by <



Input: str = “&copy; is symbol of copyright”
Output: © is symbol of copyright
Explanation: In the above example &copy; is
replaced by corresponding special character
©

Method 1 – using unordered_map: Below are the steps:

  1. Store the HTML Entity with their character in a Map.
  2. Traverse the given string and if any character ‘&’ is encountered then find which HTML Entity is present after this ampersand.
  3. Add the corresponding character with the Entity in the output string.
  4. Print the output string as the result.

Below is the implementation of the above approach:

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ program for the above approach
#include <iostream>
#include <unordered_map>
using namespace std;
  
class GfG {
public:
    unordered_map<string, string> m;
  
public:
    // Associating html entity with
    // special character
    void initializeMap()
    {
        m["""] = "\"";
        m["'"] = "'";
        m["&"] = "&";
        m[">"] = ">";
        m["<"] = "<";
        m["⁄"] = "/";
        m[" "] = " ";
        m["®"] = "®";
        m["©"] = "©";
    }
  
public:
    // Function that convert the given
    // HTML Entity to its parsed String
    string parseInputString(string input)
    {
        // Output string
        string output = "";
  
        // Traverse the string
        for (int i = 0;
             i < input.size(); i++) {
  
            // If any ampersand is occurred
            if (input[i] == '&') {
  
                string buffer;
  
                while (i < input.size()) {
  
                    buffer = buffer + input[i];
  
                    // If any Entity is found
                    if (input[i] == ';'
                        && m.find(buffer)
                               != m.end()) {
  
                        // Append the parsed
                        // character
                        output = output
                                 + m[buffer];
  
                        // Clear the buffer
                        buffer = "";
                        i++;
                        break;
                    }
                    else {
                        i++;
                    }
                }
  
                if (i >= input.size()) {
                    output = output
                             + buffer;
                    break;
                }
                i--;
            }
            else {
                output = output
                         + input[i];
            }
        }
  
        // Return the parsed string
        return output;
    }
};
  
// Driver Code
int main()
{
    // Given String
    string input = "17 > 25 and 25 < 17";
    GfG g;
  
    // Initialised parsed string
    g.initializeMap();
  
    // Function Call
    cout << g.parseInputString(input);
    return 0;
}

chevron_right


Output:

17 > 25 and 25 < 17

Time Complexity: O(N)
Auxiliary Space: O(N)

Method 2 – using Pattern Matching:
Below are the steps:

  1. Traverse the given string str.
  2. While traversing, if any character ‘&’ is encountered then find which HTML Entity is present after this ampersand.
  3. Add the corresponding character with the Entity in the output string from the above table of matched character in the above table.
  4. Print the output string as the result after traversing the above string.

Below is the implementation of the above approach:

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ program to Parse the HTML Entities
#include <iostream>
using namespace std;
  
class GfG {
  
public:
    string parseInputString(string input)
    {
  
        // To store parsed string
        string output = "";
  
        for (int i = 0;
             i < input.size(); i++) {
  
            // Matching pattern of html
            // entity
            if (input[i] == '&') {
                string buffer;
  
                while (i < input.size()) {
                    buffer = buffer + input[i];
  
                    // Check match for (\)
                    if (input[i] == ';'
                        && buffer == """) {
                        output = output + "\"";
                        buffer = "";
                        i++;
                        break;
                    }
  
                    // Check match for (')
                    else if (input[i] == ';'
                             && buffer == "'") {
                        output = output + "'";
                        buffer = "";
                        i++;
                        break;
                    }
  
                    // Check match for (&)
                    else if (input[i] == ';'
                             && buffer == "&") {
                        output = output + "&";
                        buffer = "";
                        i++;
                        break;
                    }
  
                    // Check match for (>)
                    else if (input[i] == ';'
                             && buffer == ">") {
                        output = output + ">";
                        buffer = "";
                        i++;
                        break;
                    }
  
                    // Check match for (<)
                    else if (input[i] == ';'
                             && buffer == "<") {
                        output = output + "<";
                        buffer = "";
                        i++;
                        break;
                    }
  
                    // Check match for (/)
                    else if (input[i] == ';'
                             && buffer == "⁄") {
                        output = output + "/";
                        buffer = "";
                        i++;
                        break;
                    }
  
                    // Check match for (" ")
                    else if (input[i] == ';'
                             && buffer == " ") {
                        output = output + " ";
                        buffer = "";
                        i++;
                        break;
                    }
  
                    // Check match for (®)
                    else if (input[i] == ';'
                             && buffer == "®") {
                        output = output + "®";
                        buffer = "";
                        i++;
                        break;
                    }
  
                    // Check match for (©)
                    else if (input[i] == ';'
                             && buffer == "©") {
                        output = output + "©";
                        buffer = "";
                        i++;
                        break;
                    }
                    else {
                        i++;
                    }
                }
  
                if (i >= input.size()) {
                    output = output + buffer;
                    break;
                }
                i--;
            }
            else {
                output = output + input[i];
            }
        }
  
        // Return the parsed string
        return output;
    }
};
  
// Driver Code
int main()
{
    // Given String
    string input = "17 > 25 and 25 < 17";
    GfG g;
  
    // Initialised parsed string
    g.initializeMap();
  
    // Function Call
    cout << g.parseInputString(input);
    return 0;
}

chevron_right


Output:



17 > 25 and 25 < 17

Time Complexity: O(N)
Auxiliary Space: O(N)

Method 3 – using Regular Expression:
Below are the steps:

  1. Store all the expression with it’s mapped value in a Map M.
  2. For each key in the map, create a regular expression using:

    regex e(key);

  3. Now replace the above regular expression formed with it’s mapped value in the Map M as:

    regex_replace(str, e, value);
    where,
    str is the input string,
    e is the expression formed in the above step, and
    val is the value mapped with expression e in the Map

  4. Repeat the above steps until all expression are not replaced.

Below is the implementation of the above approach:

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ program for the above approach
#include <iostream>
#include <regex>
#include <unordered_map>
using namespace std;
  
// Given Expression with mapped value
const unordered_map<string, string> m;
m = { { """, "\" },
        { "'", "'" },
        { "&", "&" },
        { ">", ">" },
        { "<", "<" },
        { "⁄", "/" } };
  
// Function that converts the given
// HTML Entity to its parsed String
string
parseInputString(string input)
{
    for (auto& it : m) {
  
        // Create ReGex Expression
        regex e(it.first);
  
        // Replace the above expression
        // with mapped value using
        // regex_replace()
        input = regex_replace(input, e,
                              it.second);
    }
  
    // Return the parsed string
    return input;
}
  
// Driver Code
int main()
{
    // Given String
    string input
        = "17 > 25 and 25 < 17";
  
    // Function Call
    cout << parseInputString(input);
    return 0;
}

chevron_right


Output:

17 > 25 and 25 < 17

Time Complexity: O(N)
Auxiliary Space: O(N)

full-stack-img




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.