Skip to content Skip to sidebar Skip to footer

Python Scraping Date From Html Page (june 10, 2017)

How can I extract date 'June 03,2017' from html page having below table data. The date will change as per the order number. I am not sure if i am using it correctly. please advise.

Solution 1:

This may not be ideal if there are other tables on the page you are trying to parse. If there is only one table, this should work.

EDIT: added example of how to parse the actual date from the string

In[19]: from datetime import datetime
   ...: 
   ...: from bs4 import BeautifulSoup
   ...: 
   ...: html = '''\
   ...: <tr>
   ...:    <td style="font:bold 24px Arial;">Order #12345</td>
   ...:     <td style="font:13px Arial;"><strong>Order Date:</strong> June 03, 2017</td>
   ...: </tr>
   ...: '''
   ...: soup = BeautifulSoup(html, 'lxml')
   ...: 
   ...: for row in soup.find_all('tr'):
   ...:     order_number, order_date = row.find_all('td')
   ...:     print(order_number.text)
   ...:     print(order_date.text)
   ...:     d = datetime.strptime(order_date.text, 'Order Date: %B %d, %Y')
   ...:     print(d.year, d.month, d.day)
   ...: 
Order #12345
Order Date: June 03, 2017201763

Solution 2:

Alternatively,

>>>import requests>>>import bs4>>>soup = bs4.BeautifulSoup('''\...<tr>...    <td style="font:bold 24px Arial;">Order #12345</td>...    <td style="font:13px Arial;"><strong>Order Date:</strong> June 03, 2017</td>...</tr>''', 'lxml')>>>soup.find_all(text=bs4.re.compile("Order #"))[0][7:]
'12345'
>>>soup.find_all(text=bs4.re.compile("Order Date:"))[0].parent.next.next.strip()
'June 03, 2017'

No need to importre separately as it is include in bs4. I followed what you did; that is, I looked for the text then navigated from there.

Post a Comment for "Python Scraping Date From Html Page (june 10, 2017)"