This project is aim to be an extractor to search dblp data.
DBLP is an open bibliographic information on major computer science journals and proceedings.
-
config: you can find a
config_example.yaml
as example for the configuration of the tool, and which parameters of there are mandatory. -
data: is where all raw data is expected. There is a
data_example.xml
looks like dblp xml file, but has only a few items. To replicate all projects you should download the original xml file. -
database: includes the DDL file and the repositories to interact with the database. Also, it includes the class to connect with the database.
-
extractor: this folder contains the classes and modules needed to interact between the xml raw data and database.
-
model: where all model elements are placed.
-
Install project dependencies; in the root project you will find the
requirements.txt
:pip install -r requirements.txt
-
Copy or rename
config_example.yaml
toconfig.yaml
, this file is added into.gitignore
file to avoid push sensitive data. Modify this file with the database configuration. Ifdb
name (now metascience) is changed; it should be changed in ddl file as well. -
Execute
setup.py
; in the first time you should add at least--ddl
and--xml
arguments:--ddl
argument is for ddl file.--xml
argument is for xml file.--splitXml
needs a xml file and splits this file by first level xml children.--insert
needs a xml file and is aim to insert all xml data into database.
This is to only creates database:
python extractor/setup.py --ddl "./database/mariadbDDL.sql" --xml "./data/data_example.xml"
This is to create and insert database:
python extractor/setup.py --ddl "./database/mariadbDDL.sql" --xml "./data/data_example.xml" --insert True
This is to split the xml:
python extractor/setup.py --xml "./data/data_example.xml --splitXml True"
If the four parameters are applied, the inserts are from original xml, and not for the new generated by the split.