Skip to content

Latest commit

 

History

History
102 lines (72 loc) · 4.87 KB

README.md

File metadata and controls

102 lines (72 loc) · 4.87 KB

PatFam: Patent Families

README last edited: 16 Dec 2020

The purpose of this tool is to allow users to easily determine whether patent applications are in the same family. Instead of having to check each individual patent office's website, users can enter patent, application, or publication numbers for various patent offices and see the relationship between the documents.

Currently supports: TBD

screenshot

Setup

Summary of conda installs

See below for important notes including system dependencies, version compatibility, access keys, etc.

conda create -n patfam python=3.6
conda activate patfam

conda install pip
pip install uspto-opendata-python

conda install -c anaconda lxml networkx pygraphviz graphviz flask gunicorn
conda install -c conda-forge selenium matplotlib

For data from the United States Patent and Trademark Office (USPTO), we'll use the API client for the USPTO Patent Examination Data System (PEDS). Do not mix up PEDS with PBD, which is the USPTO PAIR Bulk Data (PBD) system. The PBD has been decommissioned.

# install libxml2 dependencies (make sure you have -dev versions)
sudo apt-get update
sudo apt-get install libxml2-dev libxslt1-dev python-lxml

# create conda environment and install some packages (see note below)
conda create -n patfam python=3.6
conda activate patfam
conda install -c anaconda lxml

# install pip (if you don't already have it) and the api client
conda install pip
pip install uspto-opendata-python

# run a test query
uspto-peds get "15431686" --type=application --format=xml

NOTE: As of Dec. 2020, uspto-opendata-python will not install if using Python 3.9.0 (returning gcc compilation error of this type).

See my tutorial walkthrough of the USPTO Open Data API Client here.

For data from the European Patent Office (EPO), we'll use the Open Patent Services (OPS) client developed by George Song et al. In order to get API access, you will need to request access credentials from OPS. After I submitted my request, I was granted access from EPO the next day.

conda activate patfam
pip install python-epo-ops-client

See my tutorial walkthrough of the Python EPO OPS Client here.

Web Scraping

As far as I can tell, there is no freely available API access to the World Intellectual Property Organization (WIPO; for PCT applications) or to the Japan Patent Office (JPO). For those applications, we'll request data from the web directly.

Before this I tried the Python requests package to obtain site data. I made a GET search query to WIPO PatentScope for docId=WO2001029057. However, no patent-related data could be found in the HTML content. It looks like the site is rendered dynamically using JavaScript so we'll use Selenium instead:

conda activate patfam
conda install -c conda-forge selenium

Since I'm working in the Windows Subsystem for Linux (WSL2), I needed to download an Internet browser and related driver (following the tutorial from Greg Brisebois). If you already have an Internet browser in the same OS that you're working in, you would just get to obtain the relevant driver for your browser, browser version, and OS.

# install dependencies
sudo apt-get update
sudo apt-get install -y curl unzip xvfb libxi6 libgconf-2-4

# get chrome browser
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb

# optional - prevent updates of google chrome (if it updates, you also have to update the chromedriver)
sudo apt-mark hold google-chrome-stable

# check browser install
google-chrome --version

# get chrome driver; USE THE DRIVER VERSION RELEVANT TO YOUR SETUP
wget https://chromedriver.storage.googleapis.com/87.0.4280.88/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/bin/chromedriver
sudo chown root:root /usr/bin/chromedriver
sudo chmod +x /usr/bin/chromedriver

# check driver install
chromedriver --version

Optionally, if you want to work with the Chrome window (as opposed to using it in "headless form"), try the command google-chrome to check that the window will come up. I had to resolve a few issues with WSL2 on my end before it worked. More details here.

See an example of using Selenium to obtain WIPO PatentScope data here.

See an example of using Selenium to obtain JPO patent data here.