Trouble Running My Class Crawler
Writing a class crawler in python, I got stuck on the half-way. I can't find any idea how to pass the newly produced links [generated by app_crawler class] to the 'App' class so th
Solution 1:
There are several issues with your code:
Appinherits fromapp_crawleryet you provide anapp_crawlerinstance toApp.__init__.App.__init__callsapp_crawler.__init__instead ofsuper().__init__().Not only
app_crawler.get_appdoesn't actually return anything, it creates a brand newAppobject.
This results in your code passing an app_crawler object to requests.get instead of a url string.
You have too much encapsulation in your code.
Consider the following code that is shorter than your not-working code, cleaner and without needing to needlessly pass objects around:
from lxml import html
import requests
class App:
def __init__(self, starturl):
self.starturl = starturl
self.links = []
def get_links(self):
page = requests.get(self.starturl)
tree = html.fromstring(page.text)
self.links = tree.xpath('//div[@class="lockup-info"]//*/a[@class="name"]/@href')
def process_links(self):
for link in self.links:
self.get_docs(link)
def get_docs(self, url):
page = requests.get(url)
tree = html.fromstring(page.text)
name = tree.xpath('//h1[@itemprop="name"]/text()')[0]
developper = tree.xpath('//div[@class="left"]/h2/text()')[0]
price = tree.xpath('//div[@itemprop="price"]/text()')[0]
print(name, developper, price)
if __name__ == '__main__':
parse = App("https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8")
parse.get_links()
parse.process_links()
outputs
Cookie Jam By Jam City, Inc. Free
Zombie Tsunami By Mobigame Free
Flow FreeBy Big Duck Games LLC Free
Bejeweled Blitz By PopCap Free
Juice Jam By Jam City, Inc. Free
Candy Crush Soda Saga By King Free
Bubble Witch 3 Saga By King Free
Candy Crush Jelly Saga By King Free
Farm Heroes Saga By King Free
Pet Rescue Saga By King FreeSolution 2:
This is the way I was expecting my code should be:
from lxml import html
import requests
classapp_crawler:
starturl = "https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8"def__init__(self):
self.links = [self.starturl]
defcrawler(self):
for link in self.links:
self.get_app(link)
defget_app(self, link):
page = requests.get(link)
tree = html.fromstring(page.text)
links = tree.xpath('//div[@class="lockup-info"]//*/a[@class="name"]/@href')
for link in links:
ifnotlen(self.links)>=5:
self.links.append(link)
classApp(app_crawler):
def__init__(self):
app_crawler.__init__(self)
defprocess_links(self):
for link in self.links:
self.get_item(link)
defget_item(self, url):
page = requests.get(url)
tree = html.fromstring(page.text)
name = tree.xpath('//h1[@itemprop="name"]/text()')[0]
developer = tree.xpath('//div[@class="left"]/h2/text()')[0]
price = tree.xpath('//div[@itemprop="price"]/text()')[0]
print(name, developer, price)
if __name__ == '__main__':
scrape = App()
scrape.crawler()
scrape.process_links()
Post a Comment for "Trouble Running My Class Crawler"