Skip to content

SOM-Research/dblp-extractor

Repository files navigation

dblp-extractor

This project is aim to be an extractor to search dblp data.

DBLP is an open bibliographic information on major computer science journals and proceedings.

Repository structure

  • config: you can find a config_example.yaml as example for the configuration of the tool, and which parameters of there are mandatory.

  • data: is where all raw data is expected. There is a data_example.xml looks like dblp xml file, but has only a few items. To replicate all projects you should download the original xml file.

  • database: includes the DDL file and the repositories to interact with the database. Also, it includes the class to connect with the database.

  • extractor: this folder contains the classes and modules needed to interact between the xml raw data and database.

  • model: where all model elements are placed.

Set Up

  1. Install project dependencies; in the root project you will find the requirements.txt:

    pip install -r requirements.txt
  2. Copy or rename config_example.yaml to config.yaml, this file is added into .gitignore file to avoid push sensitive data. Modify this file with the database configuration. If db name (now metascience) is changed; it should be changed in ddl file as well.

  3. Execute setup.py; in the first time you should add at least --ddl and --xml arguments:

    • --ddl argument is for ddl file.
    • --xml argument is for xml file.
    • --splitXml needs a xml file and splits this file by first level xml children.
    • --insert needs a xml file and is aim to insert all xml data into database.

    This is to only creates database:

    python extractor/setup.py --ddl "./database/mariadbDDL.sql" --xml "./data/data_example.xml"

    This is to create and insert database:

    python extractor/setup.py --ddl "./database/mariadbDDL.sql" --xml "./data/data_example.xml" --insert True

    This is to split the xml:

    python extractor/setup.py  --xml "./data/data_example.xml --splitXml True"

    If the four parameters are applied, the inserts are from original xml, and not for the new generated by the split.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published