Parsing Xml File Using Python3 And Beautifulsoup
I know there are several answers to questions regarding xml parsing with Python 3, but I can't find the answer to two that I have. I am trying to parse and extract information fro
Solution 1:
For the first part try searching for the element "name" where the attribute "primary" is present like this:
from bs4 import BeautifulSoup
import urllib
url = 'https://www.boardgamegeek.com/xmlapi/boardgame/10'
response = urllib.request.urlopen(url)
data = response.read() # a `bytes` object
text = data.decode('utf-8') # a `str`;
soup = BeautifulSoup(text,'xml')
name = soup.find('name', primary = True)
print (name.get_text())
Outputs:
Elfenland
For the second loop over the "results" elements and extract the data you want:
text = """
<poll title="User Suggested Number of Players" totalvotes="96" name="suggested_numplayers">
<results numplayers="1">
<result numvotes="0" value="Best"/>
...
<result numvotes="46" value="Not Recommended"/>
</results>
</poll>
"""
soup = BeautifulSoup(text,'xml')
for result in soup.find_all('results'):
numplayers = result['numplayers']
best = result.find('result', {'value': 'Best'})['numvotes']
recommended = result.find('result', {'value': 'Recommended'})['numvotes']
not_recommended = result.find('result', {'value': 'Not Recommended'})['numvotes']
print (numplayers, best, recommended, not_recommended)
Outputs:
1 0 0 58
2 2 21 53
3 10 46 17
4 47 36 1
5 35 44 2
6 23 48 11
6+ 0 1 46
Or if you want to do it more elegantly find all of each attribute and zip them:
soup = BeautifulSoup(text,'xml')
numplayers = [tag['numplayers'] fortagin soup.find_all('results')]
best = [tag['numvotes'] fortagin soup.find_all('result', {'value': 'Best'})]
recommended = [tag['numvotes'] fortagin soup.find_all('result', {'value': 'Recommended'})]
not_recommended = [tag['numvotes'] fortagin soup.find_all('result', {'value': 'Not Recommended'})]
print(list(zip(numplayers, best, recommended, not_recommended)))
Outputs:
[('1', '0', '0', '58'), ('2', '2', '21', '53'), ('3', '10', '46', '17'), ('4', '47', '36', '1'), ('5', '35', '44', '2'), ('6', '23', '48', '11'), ('6+', '0', '1', '46')]
Post a Comment for "Parsing Xml File Using Python3 And Beautifulsoup"