Skip to content Skip to sidebar Skip to footer

How To Get All The Tags In An Xml Using Python?

I have been researching in the Python Docs for a way to get the tag names from an XML file, but I haven't been very successful. Using the XML file below, one can get the country na

Solution 1:

Consider using element tree's iterparse() and build nested lists of tag and text pairs. Conditional if logic is used to group country items together and leave out elements with no text, then replace() is used to clean out the line breaks and multiple white spaces that iterparse() picks up:

import xml.etree.ElementTree as et

data = []
for (ev, el) in et.iterparse(path):
    inner = []

    if el.tag == 'country':        
        for name, value in el.items():
            inner.append([el.tag+'-'+name, str(value).replace('\n','').replace(' ','')])
        for i in el:
            if str(i.text) != 'None':
                inner.append([i.tag, str(i.text).replace('\n','').replace(' ','')])

            for name, value in i.items():
                inner.append([i.tag+'-'+name, str(value).replace('\n','').replace(' ','')])
        data.append(inner)

print(data)
# [[['country-name', 'Liechtenstein'], ['rank', '1'], ['year', '2008'], ['gdppc', '141100'], 
#   ['neighbor-name', 'Austria'], ['neighbor-direction', 'E'], 
#   ['neighbor-name', 'Switzerland'], ['neighbor-direction', 'W']]
#  [['country-name', 'Singapore'], ['rank', '4'], ['year', '2011'], ['gdppc', '59900'], 
#   ['neighbor-name', 'Malaysia'], ['neighbor-direction', 'N']]
#  [['country-name', 'Panama'], ['rank', '68'], ['year', '2011'], ['gdppc', '13600'], 
#   ['neighbor-name', 'CostaRica'], ['neighbor-direction', 'W'], 
#   ['neighbor-name', 'Colombia'], ['neighbor-direction', 'E']]]

Solution 2:

Look into the built-in XML functionality of Python, traverse the document recursively and collect all tags in a set.

Post a Comment for "How To Get All The Tags In An Xml Using Python?"