Given a string representing a code snippet, implement a tag validator to parse the code and return whether it is valid.
A code snippet is valid if all the following rules hold:
<TAG_NAME>TAG_CONTENT</TAG_NAME>. Among them, <TAG_NAME> is the start tag, and </TAG_NAME> is the end tag. The TAG_NAME in start and end tags should be the same. A closed tag is valid if and only if the TAG_NAME and TAG_CONTENT are valid.TAG_NAME only contain upper-case letters, and has length in range [1,9]. Otherwise, the TAG_NAME is invalid.TAG_CONTENT may contain other valid closed tags, cdata and any characters (see note1) EXCEPT unmatched <, unmatched start and end tag, and unmatched or closed tags with invalid TAG_NAME. Otherwise, the TAG_CONTENT is invalid.< is unmatched if you cannot find a subsequent >. And when you find a < or </, all the subsequent characters until the next > should be parsed as TAG_NAME (not necessarily valid).<![CDATA[CDATA_CONTENT]]>. The range of CDATA_CONTENT is defined as the characters between <![CDATA[ and the first subsequent ]]>.CDATA_CONTENT may contain any characters. The function of cdata is to forbid the validator to parse CDATA_CONTENT, so even it has some characters that can be parsed as tag (no matter valid or invalid), you should treat it as regular characters.Example 1:
Input: code = "<DIV>This is the first line <![CDATA[<div>]]></DIV>" Output: true Explanation: The code is wrapped in a closed tag : <DIV> and </DIV>. The TAG_NAME is valid, the TAG_CONTENT consists of some characters and cdata. Although CDATA_CONTENT has an unmatched start tag with invalid TAG_NAME, it should be considered as plain text, not parsed as a tag. So TAG_CONTENT is valid, and then the code is valid. Thus return true.
Example 2:
Input: code = "<DIV>>> ![cdata[]] <![CDATA[<div>]>]]>]]>>]</DIV>" Output: true Explanation: We first separate the code into : start_tag|tag_content|end_tag. start_tag -> "<DIV>" end_tag -> "</DIV>" tag_content could also be separated into : text1|cdata|text2. text1 -> ">> ![cdata[]] " cdata -> "<![CDATA[<div>]>]]>", where the CDATA_CONTENT is "<div>]>" text2 -> "]]>>]" The reason why start_tag is NOT "<DIV>>>" is because of the rule 6. The reason why cdata is NOT "<![CDATA[<div>]>]]>]]>" is because of the rule 7.
Example 3:
Input: code = "<A> <B> </A> </B>" Output: false Explanation: Unbalanced. If "<A>" is closed, then "<B>" must be unmatched, and vice versa.
Constraints:
1 <= code.length <= 500code consists of English letters, digits, '<', '>', '/', '!', '[', ']', '.', and ' '.The Tag Validator problem focuses on validating whether a code snippet follows strict XML-like tag rules. A practical strategy is to simulate the parsing process using a stack while iterating through the string. Whenever an opening tag appears, push it onto the stack, and when a closing tag appears, verify it matches the most recent opening tag.
Carefully handle special constructs such as CDATA sections, which should be treated as raw text and skipped until their closing delimiter. Additionally, ensure that tag names consist only of uppercase letters and fall within the required length constraints. The string must also start with a valid root tag and end with its corresponding closing tag.
By scanning the string once and maintaining the stack for nested tags, you can ensure correctness while efficiently validating structure and constraints. This approach avoids unnecessary re-parsing and handles nested tags naturally.
Time Complexity: O(n) since the string is processed once. Space Complexity: O(n) in the worst case due to stack usage for nested tags.
| Approach | Time Complexity | Space Complexity |
|---|---|---|
| Stack-based parsing with string traversal | O(n) | O(n) |
NeetCode
This approach uses a stack to manage nested tags and ensure that for every opening tag there is a corresponding closing tag with the same name. The strategy is to iterate through the string while checking for the start or end of tags and CDATA sections. We push the start tags onto the stack and pop from the stack when a valid matching end tag is found. Special care is given to correctly handling CDATA sections.
Time Complexity: O(n), where n is the length of the code string since we are scanning through the string once.
Space Complexity: O(n), due to the use of a stack that can, in the worst case, contain the nested tags.
1def isValid(code):
2 stack = []
3 i = 0
4 n = len(code)
5 while i < n:
6 if i > 0 and not stack:
7 return False
8 if code[i:i+9] == '<![CDATA[':
9 j = i + 9
10 i = code.find(']]>', j)
11 if i == -1:
12 return False
13 i += 3
14 elif code[i:i+2] == '</':
15 j = i + 2
16 i = code.find('>', j)
17 if i == -1 or not stack:
18 return False
19 tag = code[j:i]
20 if not stack or stack.pop() != tag:
21 return False
22 i += 1
23 elif code[i] == '<':
24 j = i + 1
25 i = code.find('>', j)
26 if i == -1:
27 return False
28 tag = code[j:i]
29 if not (1 <= len(tag) <= 9 and tag.isupper()):
30 return False
31 stack.append(tag)
32 i += 1
33 else:
34 i += 1
35 return not stackThe function isValid iterates through the code using a while loop, checking the conditions for CDATA, start tag, and end tag at each step. When a CDATA section is found, it skips past it since its contents are not relevant for tag validation. For end tags, it ensures the top of the stack has a matching start tag before popping it off. For start tags, it checks the format and valid characters before pushing it to the stack. The function returns true only if all tags are properly closed and matched, leaving the stack empty at the end.
This approach uses regular expressions to match valid patterns directly within the code string. The idea is to replace valid components iteratively, reducing the code string size incrementally until no more valid patterns can be matched. If at the end the code string is empty, it indicates that all tags and content were valid.
Time Complexity: O(n2), due to the repeated reduction of the code string in the while loop, where each replacement operation can be considered linear within the current string length.
Space Complexity: O(n), predominantly due to the storage requirements of the regex engine and the intermediate strings produced during processing.
1import re
2
3defWatch expert explanations and walkthroughs
Practice problems asked by these companies to ace your technical interviews.
Explore More ProblemsJot down your thoughts, approach, and key learnings
Yes, problems similar to Tag Validator appear in technical interviews at large tech companies. They test skills in string parsing, stack usage, and careful handling of edge cases in structured text validation.
The optimal approach uses a stack combined with a single pass string parser. Opening tags are pushed onto the stack, and closing tags must match the most recent entry. Special sections like CDATA are skipped during validation to maintain correctness.
CDATA sections allow arbitrary text that should not be parsed as tags. During validation, once a CDATA start sequence is detected, the algorithm skips characters until the CDATA closing marker to avoid misinterpreting internal symbols.
A stack is the most suitable data structure because it naturally models nested tag structures. It allows efficient tracking of the most recent opening tag and ensures proper matching when a closing tag is encountered.
Here, the isValid function uses Python's re library to define one regex pattern that matches either a CDATA block or valid nested tags. It uses this pattern within a loop that iteratively removes all occurrences of these valid patterns from the string. The process continues until no further replacements can be made. If by the end the entire string is reduced to empty, this means the input was a valid code. This uses regex to simplify the process of identifying and reducing valid sections by treating them as non-overlapping transformations.