India public goods research

This repository currently contains materials that enable replication of the paper:

Martínez Arranz, A., R. Thomson et al., 2021, "The Uneven Expansion of Electricity Supply in India", Energy Research and Social Science, Volume 78, August 2021, 102126. DOI: https://doi.org/10.1016/j.erss.2021.102126

You may use any code in this repository subject to the MIT license. Please acknowledge it in any materials with a citation to the above article.

Obtaining relevant datasets

Household data

The Consumer Pyramids survey provides all information related to households, which we then aggregate to the district-region level (as available during 2020). These data are available for purchase or after subscription. For our "Uneven expansion of electricity supply" paper, we used Waves 1 and 18, corresponding to early 2014 and late 2019.

For the replication materials to work, you must store the relevant sections as follows under the Input_data folder.

Input_data/DS_CP_full/
├── Household Amenities, Assets _ Liabilities
│   ├── Wave 2014.zip
│   └── Wave 2019.zip
├── Household Expense - Details
│   ├── Wave 2014.zip
│   └── Wave 2019.zip
├── Income
│   ├── Household - Income
│   │   ├── Wave 2014.zip
│   │   └── Wave 2019.zip
│   └── Member - Income
│   ├── Wave 2014.zip
│   └── Wave 2019.zip
└── People of India
├── Wave 2014.zip
└── Wave 2019.zip

Election data

We take the state election data from Bhavnani RR 2014 and combine it with data from www.indiavotes.com.

To facilitate replication, we provide a zipped version of the cleaned up file for data post 1990 that is used in the relevant script.

In order to combine this information with district-based consumer pyramids data, We use assembly constituency maps from "Community Created Maps of India", which is based on data from the Electoral Commission.

Note that the "Third Front" does not really exist as a formal alliance in Indian politics. We use that label to capture many 'left-of-INC' and regional parties, many of which have traditionally been part of the "Left Front".

Urban - rural distinction

To allocate assembly constituencies, we include datasets from NASA's socioeconomic data and applications center (requires free registration):

Gridded Population of the World (GPW), specifically the population density estimates for 2010 with 2.5 minute resolution
Urban extent polygons

We also contrast this data with open source information available from GeoNames

The 2011 Indian census also contains urban / rural population information that we use to verify and adjust our distinction.

The result of these combinations is something like this:

Installing relevant software

This research was mostly carried out in R > 4.0.0 with some of the final operations carried out in Stata > 14. I provide an approximation using R functions but the weighted linear regression with clustered robust standard errors are best done in Stata.

R packages

My package cptools provides some convenience functions using the tidyverse. Install it with devtools::install_github("https://github.com/AltfunsMA/cptools").

Apart from this, you should install the sf and raster packages for map manipulation, and the standardize and laeken for some convenience functions (gini and normalisation). The foreign package will also allow you to export the data to Stata.

I would also recommend using the RStudio IDE unless you really love the command line.

Running the scripts

The scripts are a snapshot of a point in our research process. Some of the variables that are calculated are not used in this first paper, but removing those calculations is more trouble than it's worth. The easiest way after you have obtained the CP data is to run 01_master.R under Scripts.

Scripts contains the full numbered pipeline that eventually generates the files we did our calculations. The subfolder create_inputs contains the scripts that modify some raw inputs cited above but not provided to generate the obejcts in Input_data.

Erratum

We have noticed that there is a data entry error in the Input_data/st_gov_by_year file.csv: in Bihar in 2010, the RJD has been miscoded as JD(U). This makes our finding regarding the "clientelist logic" lose statistical significance; but we already flagged this finding as particularly weak and unreliable.

We are also aware of some minor discrepancies in the "expected hours" values (stemming from calculations on older maps) but the findings regarding the other logics remain robust.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Input		Input
Scripts		Scripts
Submitted		Submitted
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
erss_replication.Rproj		erss_replication.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

India public goods research

Obtaining relevant datasets

Household data

Election data

Urban - rural distinction

Installing relevant software

R packages

Running the scripts

Erratum

About

Releases

Packages

Languages

License

AltfunsMA/india-research-public

Folders and files

Latest commit

History

Repository files navigation

India public goods research

Obtaining relevant datasets

Household data

Election data

Urban - rural distinction

Installing relevant software

R packages

Running the scripts

Erratum

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages