diff --git a/.gitignore b/.gitignore
index 05e3462..9ee7ffb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -72,4 +72,6 @@ ___*
dask-worker-space
*.parquet
*.zip
-*.pkl
\ No newline at end of file
+*.pkl
+*.bib.bak
+*.bib.sav
\ No newline at end of file
diff --git a/TODO.rst b/TODO.rst
index 8d04c6c..14ee958 100644
--- a/TODO.rst
+++ b/TODO.rst
@@ -6,13 +6,10 @@ Proposed analyses
The are many types of analyses which are already implemented or planned.
-- [x] Replication of the algorithm of [BH2007]_.
+- [x] Replication of the algorithm of Bessen and Hunt (2007).
- [ ] Replacing old results of Random Forest implementation with a current
implementation.
- [ ] Improving the algorithm of Bessen and Hunt (2007) on the same indicator
data with machine learning methods.
- [ ] Machine and deep learning techniques using textual data.
-- [ ] Network analysis of patents with the citation data at [PATENTSVIEW]_.
-
-.. [BH2007] https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1530-9134.2007.00136.x
-.. [PATENTSVIEW] http://www.patentsview.org/download/
+- [ ] Network analysis of patents with the citation data from PT.
diff --git a/src/documentation/introduction.rst b/src/documentation/introduction.rst
index a6210d3..3248278 100644
--- a/src/documentation/introduction.rst
+++ b/src/documentation/introduction.rst
@@ -1 +1,134 @@
-.. include:: ../../README.rst
+Introduction
+------------
+
+This project deals with the identification of software patents and combines
+multiple approaches from simple algorithms to novel machine learning models to
+achieve this goal.
+
+
+Background
+----------
+
+The origin of this project was a Bachelor's thesis built on the algorithmic
+approach of :cite:`bessen2007empirical`. The authors wanted to estimate the
+number of software patents and find out where software patents are used and
+what economic indicators are correlated with the amount of software patents in
+certain industries.
+
+To classify patents into categories of software and non-software, the authors
+developed a simple algorithm based on the evaluation of a random sample of
+patents. The algorithm is as follows:
+
+..
+
+ (("software" in specification) OR ("computer" AND "program" in
+ specification))
+
+ AND (utility patent excluding reissues)
+
+ ANDNOT ("chip" OR "semiconductor" OR "bus" OR "circuit" OR "circuitry" in
+ title)
+
+ ANDNOT ("antigen" OR "antigenic" OR "chromatography" in specification)
+
+Whereas the title is simply identified, the specification is defined as the
+abstract and the description of the patent (`PatentsView`_ separates the
+description in :cite:`bessen2007empirical` definition into description and
+summary).
+
+To replicate the algorithm, the project relies on two strategies. The first
+data source is `Google Patents
+ Absolute Number of Utility Patents
+
+
+ Absolute Number of Software vs. Non-Software Patents
+
+
+ Relative Number of Software vs. Non-Software Patents
+
+