#591 Tag Validator - Solution

Given a string representing a code snippet, implement a tag validator to parse the code and return whether it is valid.

A code snippet is valid if all the following rules hold:

The code must be wrapped in a valid closed tag. Otherwise, the code is invalid.
A closed tag (not necessarily valid) has exactly the following format : <TAG_NAME>TAG_CONTENT</TAG_NAME>. Among them, <TAG_NAME> is the start tag, and </TAG_NAME> is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is valid if and only if the TAG_NAME and TAG_CONTENT are valid.
A valid TAG_NAME only contain upper-case letters, and has length in range [1,9]. Otherwise, the TAG_NAME is invalid.
A valid TAG_CONTENT may contain other valid closed tags, cdata and any characters (see note1) EXCEPT unmatched <, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, the TAG_CONTENT is invalid.
A start tag is unmatched if no end tag exists with the same TAG_NAME, and vice versa. However, you also need to consider the issue of unbalanced when tags are nested.
A < is unmatched if you cannot find a subsequent >. And when you find a < or </, all the subsequent characters until the next > should be parsed as TAG_NAME (not necessarily valid).
The cdata has the following format : <![CDATA[CDATA_CONTENT]]>. The range of CDATA_CONTENT is defined as the characters between <![CDATA[ and the first subsequent ]]>.
CDATA_CONTENT may contain any characters. The function of cdata is to forbid the validator to parse CDATA_CONTENT, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as regular characters.

Example 1:

Input: code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>"
Output: true
Explanation: 
The code is wrapped in a closed tag : <DIV> and </DIV>. 
The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata. 
Although CDATA_CONTENT has an unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as a tag.
So TAG_CONTENT is valid, and then the code is valid. Thus return true.

Example 2:

Input: code = "<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
Output: true
Explanation:
We first separate the code into : start_tag|tag_content|end_tag.
start_tag -> "<DIV>"
end_tag -> "</DIV>"
tag_content could also be separated into : text1|cdata|text2.
text1 -> ">>  ![cdata[]] "
cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>"
text2 -> "]]>>]"
The reason why start_tag is NOT "<DIV>>>" is because of the rule 6.
The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7.

Example 3:

Input: code = "<A>  <B> </A>   </B>"
Output: false
Explanation: Unbalanced. If "<A>" is closed, then "<B>" must be unmatched, and vice versa.

Constraints:

1 <= code.length <= 500
code consists of English letters, digits, '<', '>', '/', '!', '[', ']', '.', and ' '.

Given a string representing a code snippet, implement a tag validator to parse the code and return whether it is valid.

A code snippet is valid if all the following rules hold:

The code must be wrapped in a valid closed tag. Otherwise, the code is invalid.
A closed tag (not necessarily valid) has exactly the following format : <TAG_NAME>TAG_CONTENT</TAG_NAME>. Among them, <TAG_NAME> is the start tag, and </TAG_NAME> is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is valid if and only if the TAG_NAME and TAG_CONTENT are valid.
A valid TAG_NAME only contain upper-case letters, and has length in range [1,9]. Otherwise, the TAG_NAME is invalid.
A valid TAG_CONTENT may contain other valid closed tags, cdata and any characters (see note1) EXCEPT unmatched <, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, the TAG_CONTENT is invalid.
A start tag is unmatched if no end tag exists with the same TAG_NAME, and vice versa. However, you also need to consider the issue of unbalanced when tags are nested.
A < is unmatched if you cannot find a subsequent >. And when you find a < or </, all the subsequent characters until the next > should be parsed as TAG_NAME (not necessarily valid).
The cdata has the following format : <![CDATA[CDATA_CONTENT]]>. The range of CDATA_CONTENT is defined as the characters between <![CDATA[ and the first subsequent ]]>.
CDATA_CONTENT may contain any characters. The function of cdata is to forbid the validator to parse CDATA_CONTENT, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as regular characters.

Example 1:

Input: code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>"
Output: true
Explanation: 
The code is wrapped in a closed tag : <DIV> and </DIV>. 
The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata. 
Although CDATA_CONTENT has an unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as a tag.
So TAG_CONTENT is valid, and then the code is valid. Thus return true.

Example 2:

Input: code = "<DIV>>>  ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>"
Output: true
Explanation:
We first separate the code into : start_tag|tag_content|end_tag.
start_tag -> "<DIV>"
end_tag -> "</DIV>"
tag_content could also be separated into : text1|cdata|text2.
text1 -> ">>  ![cdata[]] "
cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>"
text2 -> "]]>>]"
The reason why start_tag is NOT "<DIV>>>" is because of the rule 6.
The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7.

Example 3:

Input: code = "<A>  <B> </A>   </B>"
Output: false
Explanation: Unbalanced. If "<A>" is closed, then "<B>" must be unmatched, and vice versa.

Constraints:

1 <= code.length <= 500
code consists of English letters, digits, '<', '>', '/', '!', '[', ']', '.', and ' '.

The Tag Validator problem focuses on validating whether a code snippet follows strict XML-like tag rules. A practical strategy is to simulate the parsing process using a stack while iterating through the string. Whenever an opening tag appears, push it onto the stack, and when a closing tag appears, verify it matches the most recent opening tag.

Carefully handle special constructs such as CDATA sections, which should be treated as raw text and skipped until their closing delimiter. Additionally, ensure that tag names consist only of uppercase letters and fall within the required length constraints. The string must also start with a valid root tag and end with its corresponding closing tag.

By scanning the string once and maintaining the stack for nested tags, you can ensure correctness while efficiently validating structure and constraints. This approach avoids unnecessary re-parsing and handles nested tags naturally.

Time Complexity: O(n) since the string is processed once. Space Complexity: O(n) in the worst case due to stack usage for nested tags.

Approach	Time Complexity	Space Complexity
Stack-based parsing with string traversal	O(n)	O(n)

This approach uses a stack to manage nested tags and ensure that for every opening tag there is a corresponding closing tag with the same name. The strategy is to iterate through the string while checking for the start or end of tags and CDATA sections. We push the start tags onto the stack and pop from the stack when a valid matching end tag is found. Special care is given to correctly handling CDATA sections.

Time Complexity: O(n), where n is the length of the code string since we are scanning through the string once.

Space Complexity: O(n), due to the use of a stack that can, in the worst case, contain the nested tags.

The function isValid iterates through the code using a while loop, checking the conditions for CDATA, start tag, and end tag at each step. When a CDATA section is found, it skips past it since its contents are not relevant for tag validation. For end tags, it ensures the top of the stack has a matching start tag before popping it off. For start tags, it checks the format and valid characters before pushing it to the stack. The function returns true only if all tags are properly closed and matched, leaving the stack empty at the end.

This approach uses regular expressions to match valid patterns directly within the code string. The idea is to replace valid components iteratively, reducing the code string size incrementally until no more valid patterns can be matched. If at the end the code string is empty, it indicates that all tags and content were valid.

Time Complexity: O(n²), due to the repeated reduction of the code string in the while loop, where each replacement operation can be considered linear within the current string length.

Space Complexity: O(n), predominantly due to the storage requirements of the regex engine and the intermediate strings produced during processing.

NeetCode

6:08628,353 views

Here, the isValid function uses Python's re library to define one regex pattern that matches either a CDATA block or valid nested tags. It uses this pattern within a loop that iteratively removes all occurrences of these valid patterns from the string. The process continues until no further replacements can be made. If by the end the entire string is reduced to empty, this means the input was a valid code. This uses regex to simplify the process of identifying and reducing valid sections by treating them as non-overlapping transformations.

591. Tag Validator

Problem Statement

591. Tag Validator

Problem Statement

Approach

Complexity

Video Solution Available

Solutions (4)

Stack-Based Parsing

Explanation

Regex-Based Validation

Video Solutions

The LeetCode Fallacy

3 Tips I’ve learned after 2000 hours of Leetcode

Valid Parentheses - Stack - Leetcode 20 - Python

How to EASILY solve LeetCode problems

Valid Palindrome - Leetcode 125 - Python

Validate Binary Search Tree - Depth First Search - Leetcode 98

This Leetcode's feature is a scam ❌

Leetcode Extension to see Company Tags | Interview Preparation

How I Built a Leetcode Clone

Google Interview Question! | Valid Anagram - Leetcode 242

Asked By Companies

Prepare for Interviews

Notes

Personal Notes

Similar Problems

Related Topics

Problem Stats

Practice on LeetCode

Frequently Asked Questions

Is Tag Validator asked in FAANG interviews?

What is the optimal approach for Tag Validator?

Why are CDATA sections special in Tag Validator?

What data structure is best for solving Tag Validator?

Explanation