Skip to content

this is sub_project of MNBVC, which is to aim to process DocLayNet dataset to MNBVC format,

Notifications You must be signed in to change notification settings

luigide2020/DocLayNetPlus_mnbvc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

DocLayNetPlus_mnbvc

this is sub_project of MNBVC, which is to aim to process DocLayNet dataset to MNBVC format.

Steps:

  1. Download and unzip:
    • wget -c https://codait-cos-dax.s3.us.cloud-object-storage.appdomain.cloud/dax-doclaynet/1.0.0/DocLayNet_core.zip
    • unzip DocLayNet_core.zip
    • wget -c https://codait-cos-dax.s3.us.cloud-object-storage.appdomain.cloud/dax-doclaynet/1.0.0/DocLayNet_extra.zip
    • unzip DocLayNet_extra.zip
  2. Update data_process.py: provide local_path parameter of load_dataset method to the parent directory which 2 zip files have been extracted.
  3. Run: python data_process.py

About

this is sub_project of MNBVC, which is to aim to process DocLayNet dataset to MNBVC format,

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages