Open In App

RegEx to Match Open HTML Tags Except Self-contained XHTML Tags

Last Updated : 06 Sep, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will learn to Create a regular expression pattern that matches open tags in HTML except for self-contained XHTML tags.

A regular expression (RegEx) can be used to match open tags, excluding XHTML self-contained tags(For Eg- <br/>, <img />). This can be achieved by creating a pattern that matches opening angle brackets followed by a tag name, but excluding certain tags that are self-contained in XHTML, which don’t require a closing tag. The pattern can be tailored based on specific requirements and HTML structure.

Here are some common approaches to achieve this :

Approach 1: Using Negative Lookahead

A negative lookahead allows us to specify a pattern that should not be present after the current position in the string.

Syntax:

Regular Expression Pattern: <([a-zA-Z]+)(?![^>]*\/>)>

Example: In this example, we are using the above-explained approach.

Javascript




const regex = /<([a-zA-Z]+)(?![^>]*\/>)>/;
const inputString = 
    '<div><br/><p>Hello</p><span>World</span></div>';
const matches = inputString.match(regex);
  
console.log(matches);


Output

[
  '<div>',
  'div',
  index: 0,
  input: '<div><br/><p>Hello</p><span>World</span></div>',
  groups: undefined
]

Approach 2: Using a Whitelist of HTML Tags

Another approach is to create a whitelist of HTML tags that are considered valid open tags and match against that list.

Syntax:

Regular Expression Pattern: <(div|p|span|...)>

Example: In this example, we are using the above-explained approach.

Javascript




const regex = /<(div|p|span)>/;
const inputString = 
    '<div><br/><p>Hello</p><span>World</span></div>';
const matches = inputString.match(regex);
  
console.log(matches);


Output

[
  '<div>',
  'div',
  index: 0,
  input: '<div><br/><p>Hello</p><span>World</span></div>',
  groups: undefined
]

Approach 3: Using DOM Parse

The DOM Parser is a JavaScript utility that is built-in to HTML/XML strings and converts them into a structured document object model (DOM) representation, making it simple to navigate and manipulate the document’s contents.

Syntax:

parseFromString(string, mimeType);

Example: In this example, we are using the above-explained approach.

Javascript




// Example HTML input
const Data =
    '<div class="container"><p>Hello, <span>world!</span></p></div>';
  
// Create a DOM parser
const parser = new DOMParser();
  
// Parse the HTML string
const inputElement = parser.parseFromString(Data, 'text/html');
  
// Get all elements
const elements = inputElement.getElementsByTagName('*');
  
// Filter open tags
const matches = Array.from(elements).filter((element) =>
    element.outerHTML.match(/<([A-Za-z][A-Za-z0-9]*)\b(?![^>]*\/>)/));
  
// Output the matched open tags
console.log(matches);


Output:

(6) 
0: html
1: head
2: body
3: div.container
4: p
5: span


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads