Google Patents Scraping With Beautiful Soup
I am trying to scrape data from Google Patents with Beautiful Soup and add some columns to an existing csv. Here is an example of patent result. Here is my code: with open ('patent
Solution 1:
To get codes from the Google patent page, you can use this example:
import requests
from bs4 import BeautifulSoup
url = 'https://patents.google.com/patent/EP3017304B1/en'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for code in soup.select('[itemprop="Code"]:has(~ meta[itemprop="Leaf"])'):
print(code.text)
print(code.find_next('span').text)
print('-' * 80)
Prints:
G01N33/5438
Electrodes
--------------------------------------------------------------------------------
G01N27/3275
Sensing specific biomolecules, e.g. nucleic acid strands, based on an electrode surface reaction
--------------------------------------------------------------------------------
G01N33/5308
Immunoassay; Biospecific binding assay; Materials therefor for analytes not provided for elsewhere, e.g. nucleic acids, uric acid, worms, mites
--------------------------------------------------------------------------------
G01N33/5436
Immunoassay; Biospecific binding assay; Materials therefor with an insoluble carrier for immobilising immunochemicals with ligand physically entrapped within the solid phase
--------------------------------------------------------------------------------
G01N33/544
Immunoassay; Biospecific binding assay; Materials therefor with an insoluble carrier for immobilising immunochemicals the carrier being organic
--------------------------------------------------------------------------------
G01N33/9413
Dopamine
--------------------------------------------------------------------------------
G01N33/9446
Antibacterials
--------------------------------------------------------------------------------
G01N33/946
CNS-stimulants, e.g. cocaine, amphetamines
--------------------------------------------------------------------------------
G01N2333/78
Connective tissue peptides, e.g. collagen, elastin, laminin, fibronectin, vitronectin, cold insoluble globulin [CIG]
--------------------------------------------------------------------------------
EDIT: For status of the applications:
import requests
from bs4 import BeautifulSoup
url = 'https://patents.google.com/patent/EP3017304B1/en'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for application in soup.select('li[itemprop="application"]'):
print(application.select_one('[itemprop="countryCode"]').text)
print(application.select_one('[itemprop="applicationNumber"]').text)
print(application.select_one('[itemprop="legalStatus"]').text)
print('-' * 80)
Prints:
WO
PCT/EP2014/064249
Application Filing
--------------------------------------------------------------------------------
US
US14/901,760
Active
--------------------------------------------------------------------------------
EP
EP14737196.7A
Active
--------------------------------------------------------------------------------
EP
EP17184772.6A
Withdrawn
--------------------------------------------------------------------------------
ES
ES14737196.7T
Active
--------------------------------------------------------------------------------
US
US15/702,938
Active
--------------------------------------------------------------------------------
Post a Comment for "Google Patents Scraping With Beautiful Soup"