Skip to content Skip to sidebar Skip to footer

Google Patents Scraping With Beautiful Soup

I am trying to scrape data from Google Patents with Beautiful Soup and add some columns to an existing csv. Here is an example of patent result. Here is my code: with open ('patent

Solution 1:

To get codes from the Google patent page, you can use this example:

import requests
from bs4 import BeautifulSoup

url = 'https://patents.google.com/patent/EP3017304B1/en'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for code in soup.select('[itemprop="Code"]:has(~ meta[itemprop="Leaf"])'):
    print(code.text)
    print(code.find_next('span').text)
    print('-' * 80)

Prints:

G01N33/5438
Electrodes
--------------------------------------------------------------------------------
G01N27/3275
Sensing specific biomolecules, e.g. nucleic acid strands, based on an electrode surface reaction
--------------------------------------------------------------------------------
G01N33/5308
Immunoassay; Biospecific binding assay; Materials therefor for analytes not provided for elsewhere, e.g. nucleic acids, uric acid, worms, mites
--------------------------------------------------------------------------------
G01N33/5436
Immunoassay; Biospecific binding assay; Materials therefor with an insoluble carrier for immobilising immunochemicals with ligand physically entrapped within the solid phase
--------------------------------------------------------------------------------
G01N33/544
Immunoassay; Biospecific binding assay; Materials therefor with an insoluble carrier for immobilising immunochemicals the carrier being organic
--------------------------------------------------------------------------------
G01N33/9413
Dopamine
--------------------------------------------------------------------------------
G01N33/9446
Antibacterials
--------------------------------------------------------------------------------
G01N33/946
CNS-stimulants, e.g. cocaine, amphetamines
--------------------------------------------------------------------------------
G01N2333/78
Connective tissue peptides, e.g. collagen, elastin, laminin, fibronectin, vitronectin, cold insoluble globulin [CIG]
--------------------------------------------------------------------------------

EDIT: For status of the applications:

import requests
from bs4 import BeautifulSoup

url = 'https://patents.google.com/patent/EP3017304B1/en'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for application in soup.select('li[itemprop="application"]'):
    print(application.select_one('[itemprop="countryCode"]').text)
    print(application.select_one('[itemprop="applicationNumber"]').text)
    print(application.select_one('[itemprop="legalStatus"]').text)
    print('-' * 80)

Prints:

WO
PCT/EP2014/064249
Application Filing
--------------------------------------------------------------------------------
US
US14/901,760
Active
--------------------------------------------------------------------------------
EP
EP14737196.7A
Active
--------------------------------------------------------------------------------
EP
EP17184772.6A
Withdrawn
--------------------------------------------------------------------------------
ES
ES14737196.7T
Active
--------------------------------------------------------------------------------
US
US15/702,938
Active
--------------------------------------------------------------------------------

Post a Comment for "Google Patents Scraping With Beautiful Soup"