RegEx to Match Open HTML Tags Except Self-contained XHTML Tags
Last Updated :
06 Sep, 2023
In this article, we will learn to Create a regular expression pattern that matches open tags in HTML except for self-contained XHTML tags.
A regular expression (RegEx) can be used to match open tags, excluding XHTML self-contained tags(For Eg- <br/>, <img />). This can be achieved by creating a pattern that matches opening angle brackets followed by a tag name, but excluding certain tags that are self-contained in XHTML, which don’t require a closing tag. The pattern can be tailored based on specific requirements and HTML structure.
Here are some common approaches to achieve this :
A negative lookahead allows us to specify a pattern that should not be present after the current position in the string.
Syntax:
Regular Expression Pattern: <([a-zA-Z]+)(?![^>]*\/>)>
Example: In this example, we are using the above-explained approach.
Javascript
const regex = /<([a-zA-Z]+)(?![^>]*\/>)>/;
const inputString =
'<div><br/><p>Hello</p><span>World</span></div>' ;
const matches = inputString.match(regex);
console.log(matches);
|
Output
[
'<div>',
'div',
index: 0,
input: '<div><br/><p>Hello</p><span>World</span></div>',
groups: undefined
]
Approach 2: Using a Whitelist of HTML Tags
Another approach is to create a whitelist of HTML tags that are considered valid open tags and match against that list.
Syntax:
Regular Expression Pattern: <(div|p|span|...)>
Example: In this example, we are using the above-explained approach.
Javascript
const regex = /<(div|p|span)>/;
const inputString =
'<div><br/><p>Hello</p><span>World</span></div>' ;
const matches = inputString.match(regex);
console.log(matches);
|
Output
[
'<div>',
'div',
index: 0,
input: '<div><br/><p>Hello</p><span>World</span></div>',
groups: undefined
]
Approach 3: Using DOM Parse
The DOM Parser is a JavaScript utility that is built-in to HTML/XML strings and converts them into a structured document object model (DOM) representation, making it simple to navigate and manipulate the document’s contents.
Syntax:
parseFromString(string, mimeType);
Example: In this example, we are using the above-explained approach.
Javascript
const Data =
'<div class="container"><p>Hello, <span>world!</span></p></div>' ;
const parser = new DOMParser();
const inputElement = parser.parseFromString(Data, 'text/html' );
const elements = inputElement.getElementsByTagName( '*' );
const matches = Array.from(elements).filter((element) =>
element.outerHTML.match(/<([A-Za-z][A-Za-z0-9]*)\b(?![^>]*\/>)/));
console.log(matches);
|
Output:
(6)
0: html
1: head
2: body
3: div.container
4: p
5: span
Share your thoughts in the comments
Please Login to comment...