Parsing XML Comments with Python

The Extensible Markup Language (XML) is a markup format originally for XML. Some companies use XML as a data serialization format or for configuration. Recently, I needed to learn how to uncomment some lines in an XML file to enable some settings that were defined in the XML.

Fortunately, Python’s xml module provides a way to do this. Let’s find out how!

AI Answers Might Not Work

When I used Google to search for an answer to this question: “How to edit comments in XML with Python”, Google Gemini piped up with this answer:

import xml.etree.ElementTree as ET

xml_file = "PATH_TO_XML"

tree = ET.parse(xml_file)
root = tree.getroot()

for element in root.iter():
    if isinstance(element.tag, str) and element.tag.startswith('{'):
        continue
    if element.tag == ET.Comment:
        text = element.text.strip()
        print(text)

Unfortunately, this code does not work. But it is close.

If you look through StackOverflow and similar forums, you will discover that you need a custom parser. Here’s how to create one:

import xml.etree.ElementTree as ET

xml_file = r"PATH_TO_XML"

parser = ET.XMLParser(target=ET.TreeBuilder(insert_comments=True))
tree = ET.parse(xml_file, parser)
root = tree.getroot()

for element in root.iter():
    if isinstance(element.tag, str) and element.tag.startswith('{'):
        continue

    if element.tag == ET.Comment:
        text = element.text.strip()        
        print(text)

The key point here is to create an instance of ET.XMLParser and set insert_comments to True. Then the code will work.

Note that this example just prints out the commented text. You would need to do something like this to grab the commented text and reinsert it as a valid XML element:

for element in root.iter():
   if isinstance(element.tag, str) and element.tag.startswith('{'):
      continue
   if element.tag == ET.Comment:
      text = element.text.strip()
      if "COMMENTED CODE SUBSTRING" in text:
         new_element = ET.fromstring(f"<{text}>")
         # Insert the uncommented text as a new XML element
         root.insert(list(root).index(element), new_element)
         # Remove the element that was commented out originally
         root.remove(element)

# Make indentation work for the output
ET.indent(tree, space="\t", level=0)

with open(XML_PATH, "wb") as f:
   tree.write(f)

Here, you loop over each element or tag in the XML. You check if the element is a comment type. If it is, you check for the substring you are looking for in the comment’s text. When you find the substring, you extract the entire string from the comment, create a new element, insert it as a regular element, and remove the comment.

Wrapping Up

XML is a handy format, and Python includes several different methods of working with XML in its xml module. Several different third-party XML modules, such as lxml, are also great alternatives. If you work with XML, hopefully you will find this article helpful.

Have fun and happy coding!