Indonesian Population per District, based on 2010 Population Census

... because I can't get my hands on the detailed results of more recent censuses.

Important Note

If you use this data in a publication, Statistics Indonesia (BPS) requires you to cite, or otherwise give acknowledgement, that your data is sourced from BPS.

If you cite me -- or mention that I convert from PDF to CSV -- I'll be glad, though I don't know how exactly.

The PDF file is the original source material (book) from which I extract the data. The PDF file contains tabulated population count data, which I extracted using a tool called Camelot.
The Python files are the scripts I used to extract and tidy the data.
The CSV files are the outputs from the Python files. They contain the population data in CSV format, which can be loaded and read using Excel.
- ID-population-kec-by-book.csv has all the rows for district, region, and province aggregates are mixed together. This is exactly as found in the book. You can use this if you want to find something and you need it to be exactly as found in the book.
- ID-population-kec-tidy.csv is the tidier format. The format is one row for one district. I would recommend you to use this.
- warnings-row-with-newline.csv is there just for debugging purposes and does not contain any meaningful population data.

Workflow

Environment used to perform this work:

Windows 7
Python 3.8.5
pandas 1.3.4
camelot-py 0.10.1

The following is the steps that I do to obtain the data:

Make sure the dependencies are installed.
Have the PDF file and the Python scripts in one folder.
Read the data from the PDF by invoking (in the folder):
```
python reading_data.py
```
This step will create ID-population-kec-by-book.csv and warnings-row-with-newline.csv.
Tidy the data into more convenient format by invoking:
```
python transforming_data_tidy.py
```
This step will create ID-population-by-kec-tidy.csv.

Credits and Acknowledgements

This work is available thanks to:

Statistics Indonesia (Badan Pusat Statistik), the Indonesian official statistics bureau that carried out the census and published the data.
Camelot, the Python library used to pull the data from PDF format.

Also:

Original link from where I downloaded the PDF file. I cannot make absolute guarantee that this file is original, but I think it's fine.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
50099-ID-hasil-olah-cepat-penduduk-indonesia-menurut-provinsi-kabkota-dan-kecamatan-sp201.pdf		50099-ID-hasil-olah-cepat-penduduk-indonesia-menurut-provinsi-kabkota-dan-kecamatan-sp201.pdf
ID-population-kec-by-book.csv		ID-population-kec-by-book.csv
ID-population-kec-tidy.csv		ID-population-kec-tidy.csv
README.md		README.md
reading_data.py		reading_data.py
transforming_data_tidy.py		transforming_data_tidy.py
warnings-row-with-newline.csv		warnings-row-with-newline.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Indonesian Population per District, based on 2010 Population Census

Table of Contents

Workflow

Credits and Acknowledgements

About

Releases

Languages

lahdjirayhan/populasi-indonesia-kecamatan

Folders and files

Latest commit

History

Repository files navigation

Indonesian Population per District, based on 2010 Population Census

Table of Contents

Workflow

Credits and Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Languages