semanticClimate

Conversion of IPCC documents into semantic form

goals

to convert the IPCC documents from PDF into (a) HTML (b) XML
extract terms and exploire their use and meaning
link terms to Wikidata and create AMI-dictionaries
create new structiures for navigation, search, display

Content

Initially we will start with AR6 WGIII but move onto other WG's and perhaps look backwards as well.

Strategy

Create a directory ("CProject") with all current PDFs (Chapters, etc.) Location: https://github.com/petermr/semanticClimate/ipcc

Download components, using a hierarchical naming scheme, and convert to text (pdf2txt)

semanticClimate pm286$ cd ipcc/ar6/wg3/
$ ls
Chapter01.pdf
$ mkdir Chapter01
$ cp Chapter01.pdf Chapter01/fulltext.pdf
$ cd Chapter01
$ pdf2txt.py -o fulltext.txt fulltext.pdf 
$ ls
fulltext.pdf	fulltext.txt

If you do not get fulltext.txt after running the ls command then, install pdfminer.six and execute the following commads in the Command Prompt:

cd pdfminer.six
python tools/pdf2txt.py "Copy path.pdf" -o "Copy path.txt"

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
CaptionSeparation		CaptionSeparation
TextToHTML		TextToHTML
abbreviation		abbreviation
abbreviations/Codes		abbreviations/Codes
ipcc/ar6/wg3		ipcc/ar6/wg3
outreach		outreach
Chapter 04.pptx		Chapter 04.pptx
LICENSE		LICENSE
PDF2TXT		PDF2TXT
README.md		README.md
pdf2txt.py (03.06.2022)		pdf2txt.py (03.06.2022)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

semanticClimate

goals

Content

Strategy

About

Releases

Packages

Languages

License

Enakshi-1998/semanticClimate

Folders and files

Latest commit

History

Repository files navigation

semanticClimate

goals

Content

Strategy

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages