Skip to content Skip to sidebar Skip to footer

How To Wait For The Site To Return The Data Using Beautifulsoup4

I wrote a script using beautifulsoup4 , the script basically brings the list of ciphers from the table present on a web page. The problem is my python script doesn't wait for the

Solution 1:

I was thinking that this page use JavaScript to get data but it use old HTML method to refresh page.

It adds HTML tag <meta http-equiv="refresh" content='**time**; url> and browser will reload page after time seconds.

You have to check this tag - if you find it then you can wait and you have to load page again. Mostly you can reload page without waiting and you get data or you find this tag again.

import requests
from bs4 import BeautifulSoup
import time

site = 'some_site_name.com'
url = 'https://www.ssllabs.com/ssltest/analyze.html?d='+site

# --- while True:
    r = requests.get(url)

    soup = BeautifulSoup(r.text)

    refresh = soup.find_all('meta', attrs={'http-equiv': 'refresh'})
    #print'refresh:', refresh 

    ifnot refresh:
        break

    #wait = int(refresh[0].get('content','0').split(';')[0])
    #print'wait:', wait
    #time.sleep(wait)

# ---table = soup.find_all('table', class_='reportTable', limit=5)

iftable:
    table = table[-1]
    data = [str(td.text.split()[0]) for td intable.select("td.tableLeft")]
    print str(data)
else:
    print"[!] no data"

Solution 2:

If the data isn't present in the original HTML page but is returned from JS code in the background, consider using a headless browser, such as PhantomJS, with Selenium. Here's an example.

Post a Comment for "How To Wait For The Site To Return The Data Using Beautifulsoup4"