Skip to content

kljh/xsv_to_sqlite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Normalise CSV file

Kaggle-like data set are usually a few hundred megabytes or more, available as CSV files. Other example is the Dataset of Travis CI and Google Testing Results.

CSV is a plain human friendly storage format. It is however inefficient in terms of space and processing. This repo contains a script to normalize dataset.

Concretely:

  • GooglePresCleanData.out.zip 66MB small, unusable

  • GooglePresCleanData.out.txt 481MB big usable

  • GooglePresCleanData.out.sqlite 50MB smallest, most usable

  • RailsCleanData.out.zip 150MB small, unusable

  • RailsCleanData.out.txt 2.3GB big, usable

  • RailsCleanData.out.sqlite 319MB small, most usable (& can be manually normalised further)

About

CSV to normalised SQLite

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages