Skip to content Skip to sidebar Skip to footer

Parsing Xml File Using Python3 And Beautifulsoup

I know there are several answers to questions regarding xml parsing with Python 3, but I can't find the answer to two that I have. I am trying to parse and extract information fro

Solution 1:

For the first part try searching for the element "name" where the attribute "primary" is present like this:

from bs4 import BeautifulSoup
import urllib

url = 'https://www.boardgamegeek.com/xmlapi/boardgame/10'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8') # a `str`;
soup = BeautifulSoup(text,'xml')
name = soup.find('name', primary = True)

print (name.get_text())

Outputs:

Elfenland

For the second loop over the "results" elements and extract the data you want:

text = """
<poll title="User Suggested Number of Players" totalvotes="96"  name="suggested_numplayers">
    <results numplayers="1">
        <result numvotes="0" value="Best"/>
...
        <result numvotes="46" value="Not Recommended"/>
    </results>
</poll>
"""
soup = BeautifulSoup(text,'xml')

for result in soup.find_all('results'):
    numplayers = result['numplayers']
    best = result.find('result', {'value': 'Best'})['numvotes']
    recommended = result.find('result', {'value': 'Recommended'})['numvotes']
    not_recommended = result.find('result', {'value': 'Not Recommended'})['numvotes']
    print (numplayers, best, recommended, not_recommended)

Outputs:

1 0 0 58
2 2 21 53
3 10 46 17
4 47 36 1
5 35 44 2
6 23 48 11
6+ 0 1 46

Or if you want to do it more elegantly find all of each attribute and zip them:

soup = BeautifulSoup(text,'xml')
numplayers = [tag['numplayers'] fortagin soup.find_all('results')]
best = [tag['numvotes'] fortagin soup.find_all('result', {'value': 'Best'})]
recommended = [tag['numvotes'] fortagin soup.find_all('result', {'value': 'Recommended'})]
not_recommended = [tag['numvotes'] fortagin soup.find_all('result', {'value': 'Not Recommended'})]
print(list(zip(numplayers, best, recommended, not_recommended)))

Outputs:

[('1', '0', '0', '58'), ('2', '2', '21', '53'), ('3', '10', '46', '17'), ('4', '47', '36', '1'), ('5', '35', '44', '2'), ('6', '23', '48', '11'), ('6+', '0', '1', '46')]

Post a Comment for "Parsing Xml File Using Python3 And Beautifulsoup"