How Is This Webpage Blocking Me When I Scrape Through A Loop But Not When I Access It Directly?

March 20, 2024 Post a Comment

I am trying to scrape a set of webpages. When I scrape from one webpage directly, I am able to access the html. However, when I iterate through a pd dataframe to scrape a set of

Solution 1:

Iterrows returns a tuple of (index,(columns)), so the solution is to parse it slightly differently:

for _,(first_name, last_name) in names.iterrows():
    url = "https://zbmath.org/authors/?q={}+{}".format(first_name, 
    last_name)
    r = requests.get(url)
    html = BeautifulSoup(r.text)
    html=str(html)
    frequency = re.findall('Joint\sPublications">(.*?)</a>', html)
    freq.append(frequency)

print(freq)

Python Developer

How Is This Webpage Blocking Me When I Scrape Through A Loop But Not When I Access It Directly?

Solution 1:

Post a Comment for "How Is This Webpage Blocking Me When I Scrape Through A Loop But Not When I Access It Directly?"