Skip to content

Enakshi-1998/semanticClimate

 
 

Repository files navigation

semanticClimate

Conversion of IPCC documents into semantic form

goals

  • to convert the IPCC documents from PDF into (a) HTML (b) XML
  • extract terms and exploire their use and meaning
  • link terms to Wikidata and create AMI-dictionaries
  • create new structiures for navigation, search, display

Content

Initially we will start with AR6 WGIII but move onto other WG's and perhaps look backwards as well.

Strategy

Download components, using a hierarchical naming scheme, and convert to text (pdf2txt)

semanticClimate pm286$ cd ipcc/ar6/wg3/
$ ls
Chapter01.pdf
$ mkdir Chapter01
$ cp Chapter01.pdf Chapter01/fulltext.pdf
$ cd Chapter01
$ pdf2txt.py -o fulltext.txt fulltext.pdf 
$ ls
fulltext.pdf	fulltext.txt

If you do not get fulltext.txt after running the ls command then, install pdfminer.six and execute the following commads in the Command Prompt:

cd pdfminer.six
python tools/pdf2txt.py "Copy path.pdf" -o "Copy path.txt"

About

Conversion of IPCC documents into semantic form

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.1%
  • Other 0.9%