Skip to content Skip to sidebar Skip to footer

How To Get All Image Urls With Urllib.request.urlopen From Multiple Urls

from bs4 import BeautifulSoup import urllib.request urls = [ 'https://archillect.com/1', 'https://archillect.com/2', 'https://archillect.com/3', ] soup = BeautifulSoup(urllib.req

Solution 1:

@krishna has given you the answer. I'll give you another solution for reference only.

from simplified_scrapy import Spider, SimplifiedDoc, SimplifiedMain, utils
classImageSpider(Spider):
  name = 'archillect'
  start_urls = ["https://archillect.com/1","https://archillect.com/2","https://archillect.com/3"]
  defafterResponse(self, response, url, error=None, extra=None):
    try:
      # Create file name
      end = url.find('?') if url.find('?')>0elselen(url)
      name = 'data'+url[url.rindex('/',0,end):end]
      # save imageif utils.saveResponseAsFile(response,name,'image'):
        returnNoneelse:
        return Spider.afterResponse(self, response, url, error)
    except Exception as err:
      print (err)
  defextract(self,url,html,models,modelNames):
    doc = SimplifiedDoc(html)
    urls = doc.listImg(url=url.url)
    return {'Urls':urls} 
SimplifiedMain.startThread(ImageSpider()) # Start

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/spider_examples

Solution 2:

You can not pass the list of URL.

for url in urls:
   soup = BeautifulSoup(urllib.request.urlopen(url))

Post a Comment for "How To Get All Image Urls With Urllib.request.urlopen From Multiple Urls"