Skip to content Skip to sidebar Skip to footer

Scrapy "missing Scheme In Request Url"

Here's my code below- import scrapy from scrapy.http import Request class lyricsFetch(scrapy.Spider): name = 'lyricsFetch' allowed_domains = ['metrolyrics.com'] print '\

Solution 1:

As @tintin said, you are missing the http scheme in the URLs. Scrapy needs fully qualified URLs in order to process the requests.

As far I can see, you are missing the scheme in:

start_urls = ["www.lyricsmode.com/lyrics/ ...

and

yieldRequest("www.lyricsmode.com/feed.xml")

In case you are parsing URLs from the HTML content, you should use urljoin to ensure you get a fully qualified URL, for example:

next_url = response.urljoin(href)

Solution 2:

I also encountered this problem today, URL usually has a scheme, which is very common, such as HTTP, HTTPS in url .

It should be that urls you extract from start_url response without HTTP, HTTPS such as //list.jd.com/list.html.

You should add the scheme in url It should be https://list.jd.com/list.html

Post a Comment for "Scrapy "missing Scheme In Request Url""