Beautifulsoup Parse Table Data That Doesn't Load Immediately

October 27, 2023 Post a Comment

I'm trying to download earnings announcement data from https://www.zacks.com/stock/research/MMM/earnings-announcements using beautifulsoup. When I look at the tables, the table I

Solution 1:

One way would be to fire up Selenium and make use of its Javascript engine. That's not really simple, and I offer here an alternative solution which is a total hack, but it should work for the pages you are interested in.

Assuming the pages are generated automatically, we observe that the data you want is in (continuing from you program):

importjsonearnings= json.loads(data.split('var obj =')[1].splitlines()[2])

This is leveraging the fact that Javascript objects are JSON, and so we read directly from the source. The result is a list of lists like this one:

[['10/25/2016', '9/2016', '.14', '--', '--', 'Before Open'],
 ['7/26/2016',
  '6/2016',
  '.08',
  '.08',
  '<div class=right pos_na showinline>0.00 (0.00%)</div>',
  'Before Open'],
 ['4/26/2016',
  '3/2016',
  '.92',
  '.05',
  '<div class=right pos positive pos_icon showinline up>0.13 (6.77%)</div>',
  'Before Open'],
 ['1/26/2016',
  '12/2015',
  '.62',
  '.80',
  '<div class=right pos positive pos_icon showinline up>0.18 (11.11%)</div>',
  'Before Open'],
 ['10/22/2015',
  '9/2015',
  '.01',
  '.05',
  '<div class=right pos positive pos_icon showinline up>0.04 (1.99%)</div>',
  'Before Open'],
...
]

The first element corresponds to the first row of the table, i.e. the header. You just have to clean up the data now.

Solution 2:

Without using Selenium but still using json as in the first answer you can dig out the content you need with BS.

>>>from bs4 import BeautifulSoup>>>from urllib import request>>>URL='https://www.zacks.com/stock/research/MMM/earnings-announcements'>>>HTML=request.urlopen(URL).read()>>>soup=BeautifulSoup(HTML)>>>import json>>>scripts=soup.findAll('script')>>>len(scripts)
36

>>>for script in scripts:...if script.has_attr('type') and script.attrs['type']=='text/javascript'and script.text.strip().startswith('$(document).ready(function()'):...break

With this the javascript becomes available as script.text. You would still need to do something mildly clever to extract the lines shown in Rubik's answer. Nothing like impossible though.

Python Developer

Beautifulsoup Parse Table Data That Doesn't Load Immediately

Solution 1:

Solution 2:

Post a Comment for "Beautifulsoup Parse Table Data That Doesn't Load Immediately"