How Is This Webpage Blocking Me When I Scrape Through A Loop But Not When I Access It Directly?
I am trying to scrape a set of webpages. When I scrape from one webpage directly, I am able to access the html. However, when I iterate through a pd dataframe to scrape a set of
Solution 1:
Iterrows returns a tuple of (index,(columns)), so the solution is to parse it slightly differently:
for _,(first_name, last_name) in names.iterrows():
url = "https://zbmath.org/authors/?q={}+{}".format(first_name,
last_name)
r = requests.get(url)
html = BeautifulSoup(r.text)
html=str(html)
frequency = re.findall('Joint\sPublications">(.*?)</a>', html)
freq.append(frequency)
print(freq)
Post a Comment for "How Is This Webpage Blocking Me When I Scrape Through A Loop But Not When I Access It Directly?"