Skip to content Skip to sidebar Skip to footer

Extract Information From Website Using Xpath, Python

Trying to extract some useful information from a website. I came a bit now im stuck and in need of your help! I need the information from this table http://gbgfotboll.se/serier/?sc

Solution 1:

I think is it what you want:

#coding: utf-8from lxml import etree
import lxml.html

collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse("http://gbgfotboll.se/serier/?scr=scorers&ftid=57700")
#alltablerows
xpatheval = etree.XPathDocumentEvaluator(dom)
rows= xpatheval('//div[@id="content-primary"]/div/table[1]/tbody/tr')
# If there are less than 12rows (or<=12): Take all the rowsexcept the last.
if len(rows) <=12:
    rows.pop() 
else:
    # If there are more than 12rows: Simply take the first12 rows.
    rows=rows[0:12]

forrowinrows:
    # all columns ofcurrenttablerow (Spelare, Lag, Mal, straffmal)
    columns = row.findall("td")
    # pick textual data fromeach<td>
    collected.append([column.text forcolumnin columns])

for i in collected: print i

Output:

enter image description here

Solution 2:

This is how you can get the rows you need based on what you described in your post. This is just the logic based on concept that rows is a list, you have to incorporate this into your code as needed.

iflen(rows) <=12:
    print rows[0:-1]
eliflen(rows) > 12:
    print rows[0:12]

Post a Comment for "Extract Information From Website Using Xpath, Python"