Python Scraping Date From Html Page (june 10, 2017)
How can I extract date 'June 03,2017' from html page having below table data. The date will change as per the order number. I am not sure if i am using it correctly. please advise.
Solution 1:
This may not be ideal if there are other tables on the page you are trying to parse. If there is only one table, this should work.
EDIT: added example of how to parse the actual date from the string
In[19]: from datetime import datetime
...:
...: from bs4 import BeautifulSoup
...:
...: html = '''\
...: <tr>
...: <td style="font:bold 24px Arial;">Order #12345</td>
...: <td style="font:13px Arial;"><strong>Order Date:</strong> June 03, 2017</td>
...: </tr>
...: '''
...: soup = BeautifulSoup(html, 'lxml')
...:
...: for row in soup.find_all('tr'):
...: order_number, order_date = row.find_all('td')
...: print(order_number.text)
...: print(order_date.text)
...: d = datetime.strptime(order_date.text, 'Order Date: %B %d, %Y')
...: print(d.year, d.month, d.day)
...:
Order #12345
Order Date: June 03, 2017201763
Solution 2:
Alternatively,
>>>import requests>>>import bs4>>>soup = bs4.BeautifulSoup('''\...<tr>... <td style="font:bold 24px Arial;">Order #12345</td>... <td style="font:13px Arial;"><strong>Order Date:</strong> June 03, 2017</td>...</tr>''', 'lxml')>>>soup.find_all(text=bs4.re.compile("Order #"))[0][7:]
'12345'
>>>soup.find_all(text=bs4.re.compile("Order Date:"))[0].parent.next.next.strip()
'June 03, 2017'
No need to import
re
separately as it is include in bs4
. I followed what you did; that is, I looked for the text then navigated from there.
Post a Comment for "Python Scraping Date From Html Page (june 10, 2017)"