Skip to content

A repository to convert raw data from the US-IRAN project to Database-ready JSON

License

Notifications You must be signed in to change notification settings

vogelino/hidden-perspectives-data-sandbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hidden Perspectives – Data Sandbox

This repository includes scripts that help with the data preparations and deployment for Hidden Perspectives App.

Clone the project

First, clone the project locally and move into the folder. Open your terminal and run:

$ git clone [email protected]:vogelino/hidden-perspectives-app.git
$ cd hidden-perspectives-data-sandbox

Install the dependencies

You have to install the project's dependencies using Yarn.

$ yarn install

Configure your environment variables

Dandelion API

For data preparation we're using the Entity Extraction API from Dandelion API. Follow their instructions and get a free API Key.

Graphcool

Graphcool is an open-source and self-hosted backend-as-a-service to develop serverless GraphQL backends. Create a Graphcool project using the Graphcool Console You'll need the project id and authorization token for deploying the data.

LocationIQ

Sign up and get a developer token. Copy .env.sample, rename the file to .env. and edit so it matches your credentials.

Available scripts

yarn convertDocumentsAndEvents

Converts the provided excel sheets and does all the data processing we need for our database schema.

yarn prepareDataForGraphcool

Creates nodes and relations and finally imports them to Graphcool.

Prepare data

Directory structure

Create a data and graphcoolData directories and subdirectories so your folder structure matches the following directory tree:

.
├── README.md
├── package-lock.json
├── package.json
├── .env.sample
├── .env.sample
├── .gitignore
├── scripts
│   ├── constants.js
│   ├── convertDocumentsAndEvents
│   ├── prepareDataForGraphcool
│   └── utils
│
├── data
│   ├── sheets
│   │   ├── documents
│   │   └── events
│   │
│   ├── original_documents
│   ├── text_files
│   └── json
│       ├── documents
│       ├── entities
│       ├── documents
│       ├── events
│       ├── kind
│       ├── locations
│       └── stakeholder
│
└── graphcoolData
    ├── nodes
    └── relations

data

  • sheets – Contains the excel sheets that were provided as dataset
  • original_documents – The original document PDF's
  • text_files – Document transcripts
  • json – The convertDocumentsAndEvents scripts save data to this folder

graphcoolData

The prepareDataForGraphcool scripts read the data from ./data. Then parses and writes it to JSON-files that match the database scheme and can be imported to Graphcool.

  • nodes – Nodes that are imported to Graphcool
  • relations – Relations that are imported to Graphcool

Originnal document and event files

Documents and events were provided as excel sheets. With the following structures:

Documents

Document excel sheet are placed in ./data/sheets/documents/. Their file names have to be briefing-book-documents--{briefingBookNumber}. For example:

.
├── briefing-book-documents--01
├── briefing-book-documents--02
├── briefing-book-documents--03
└── ...

Fields with the title file name should match the file names in the ./data/original_documents/ directory.

file name BB Page Session Number Duplicate of Author Contributor Subject Kind ID Kind Classification ID Classification Title Date Summary Source/From To/For Publisher Publication Date Bibliographic Info ...
uir001001
uir001002
uir001003

Events

Event excel sheet are placed in ./data/sheets/events/. Their file names have to be briefing-book-events--{briefingBookNumber}. For example:

.
├── briefing-book-events--01
├── briefing-book-events--02
├── briefing-book-events--03
└── ...
BB ID Original Date Start Date End Date Title Description Location Reference Flag Notes
1 1
1 2
1 3

About

A repository to convert raw data from the US-IRAN project to Database-ready JSON

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published