Skip to content

Commit

Permalink
WorldDevelopmentIndicators Code Update (#964)
Browse files Browse the repository at this point in the history
* WDI Code Update

* WDI Readme Update

* Code modification and Auto Refresh

* Auto refresh modification

* Readme

* Addressed PR Comments

* Test Script

* Lint

* Code modification

* Core team PR comment

* Removed unwanted lines

* Corn Schedule
  • Loading branch information
saanikaaa authored Dec 10, 2024
1 parent 9a0994d commit f88e1c0
Show file tree
Hide file tree
Showing 8 changed files with 690 additions and 152 deletions.
19 changes: 19 additions & 0 deletions scripts/world_bank/wdi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,25 @@ To generate `output/WorldBank_StatisticalVariables.mcf`,
python3 worldbank.py --indicatorSchemaFile=<DESIRED INDICATOR CSV FILE> --fetchFromSource=<true TO RE-FETCH FROM WDI WEBSITE INSTEAD OF USING CHECKED-IN PREPROCESSED CSVS ELSE false>
```

#### Processing Steps for Refreshing Data

To generate `output/WorldBank_StatisticalVariables.mcf`,
`output/WorldBank.tmcf`, and `output/WorldBank.csv`, run:

```bash
python3 worldbank.py
```

If you want to perform "only process", run the below command:
```bash
python3 preprocess.py --mode=process
```

If you want to perform "only download", run the below command:
```bash
python3 preprocess.py --mode=download
```

We highly recommend the use of the import validation tool for this import which
you can find in
https://github.com/datacommonsorg/tools/tree/master/import-validation-helper.
22 changes: 22 additions & 0 deletions scripts/world_bank/wdi/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"import_specifications": [
{
"import_name": "WorldDevelopmentIndicators",
"curator_emails": [
"[email protected]"
],
"provenance_url": "https://datacatalog.worldbank.org/dataset/world-development-indicators/",
"provenance_description": "Variables related to demographics, energy, health, labor, etc. from the World Bank",
"scripts": [
"worldbank.py"
],
"import_inputs": [
{
"template_mcf": "output/WorldBank.tmcf",
"cleaned_csv": "output/WorldBank.csv"
}
],
"cron_schedule": "0 11 * * 2"
}
]
}
171 changes: 171 additions & 0 deletions scripts/world_bank/wdi/output/WorldBank_StatisticalVariables.mcf
Original file line number Diff line number Diff line change
Expand Up @@ -634,3 +634,174 @@ statType: dcs:measuredValue
measuredProperty: dcs:amount
transferType: dcs:OutwardRemittance


Node: dcid:WorldBank/VC_IHR_PSRC_P5
name: "Intentional homicides (per 100,000 people)"
description: "Intentional homicides are estimates of unlawful homicides purposely inflicted as a result of domestic disputes, interpersonal violence, violent conflicts over land resources, intergang violence over turf or control, and predatory violence and killing by armed groups. Intentional homicide does not include all intentional killing; the difference is usually in the organization of the killing. Individuals or small groups usually commit homicide, whereas killing in armed conflict is usually committed by fairly cohesive groups of up to several hundred members and is thus usually excluded. UN Office on Drugs and Crime's International Homicide Statistics database."
typeOf: dcs:StatisticalVariable
populationType: dcs:CriminalActivities
statType: dcs:measuredValue
measuredProperty: dcs:count
measurementDenominator: dcs:Count_Person
crimeType: dcs:MurderAndNonNegligentManslaughter


Node: dcid:WorldBank/SH_DYN_MORT
name: "Mortality rate, under-5 (per 1,000 live births)"
description: "Under-five mortality rate is the probability per 1,000 that a newborn baby will die before reaching age five, if subject to age-specific mortality rates of the specified year. Estimates Developed by the UN Inter-agency Group for Child Mortality Estimation (UNICEF, WHO, World Bank, UN DESA Population Division) at www.childmortality.org."
typeOf: dcs:StatisticalVariable
populationType: dcs:Person
statType: dcs:measuredValue
measuredProperty: dcs:mortalityRate
measurementDenominator: dcs:Count_BirthEvent_LiveBirth
age: dcs:YearsUpto4


Node: dcid:WorldBank/SH_PRV_SMOK
name: "Smoking prevalence, total (ages 15+)"
description: "Prevalence of smoking is the percentage of men and women ages 15 and over who currently smoke any tobacco product on a daily or non-daily basis. It excludes smokeless tobacco use. The rates are age-standardized. World Health Organization, Global Health Observatory Data Repository (http://apps.who.int/ghodata/)."
typeOf: dcs:StatisticalVariable
populationType: dcs:Person
statType: dcs:measuredValue
measuredProperty: dcs:count
measurementDenominator: dcs:Count_Person_15OrMoreYears
healthBehavior: dcs:Smoking
age: dcs:Years15Onwards


Node: dcid:WorldBank/SH_PRV_SMOK_FE
name: "Smoking prevalence, females (% of adults)"
description: "Prevalence of smoking, female is the percentage of women ages 15 and over who currently smoke any tobacco product on a daily or non-daily basis. It excludes smokeless tobacco use. The rates are age-standardized. World Health Organization, Global Health Observatory Data Repository (http://apps.who.int/ghodata/)."
typeOf: dcs:StatisticalVariable
populationType: dcs:Person
statType: dcs:measuredValue
measuredProperty: dcs:count
measurementDenominator: dcs:Count_Person_15OrMoreYears_Female
healthBehavior: dcs:Smoking
age: dcs:Years15Onwards
gender: dcs:Female


Node: dcid:WorldBank/SH_PRV_SMOK_MA
name: "Smoking prevalence, males (% of adults)"
description: "Prevalence of smoking, male is the percentage of men ages 15 and over who currently smoke any tobacco product on a daily or non-daily basis. It excludes smokeless tobacco use. The rates are age-standardized. World Health Organization, Global Health Observatory Data Repository (http://apps.who.int/ghodata/)."
typeOf: dcs:StatisticalVariable
populationType: dcs:Person
statType: dcs:measuredValue
measuredProperty: dcs:count
measurementDenominator: dcs:Count_Person_15OrMoreYears_Male
healthBehavior: dcs:Smoking
age: dcs:Years15Onwards
gender: dcs:Male


Node: dcid:WorldBank/SH_STA_DIAB_ZS
name: "Diabetes prevalence (% of population ages 20 to 79)"
description: "Diabetes prevalence refers to the percentage of people ages 20-79 who have type 1 or type 2 diabetes. International Diabetes Federation, Diabetes Atlas."
typeOf: dcs:StatisticalVariable
populationType: dcs:Person
statType: dcs:measuredValue
measuredProperty: dcs:count
measurementDenominator: dcs:Count_Person_20To79Years
healthOutcome: dcs:Diabetes
age: dcs:Years20To79


Node: dcid:WorldBank/SP_DYN_CBRT_IN
name: "Birth rate, crude (per 1,000 people)"
description: "Crude birth rate indicates the number of live births occurring during the year, per 1,000 population estimated at midyear. Subtracting the crude death rate from the crude birth rate provides the rate of natural increase, which is equal to the rate of population change in the absence of migration. (1) United Nations Population Division. World Population Prospects: 2019 Revision. (2) Census reports and other statistical publications from national statistical offices, (3) Eurostat: Demographic Statistics, (4) United Nations Statistical Division. Population and Vital Statistics Report (various years), (5) U.S. Census Bureau: International Database, and (6) Secretariat of the Pacific Community: Statistics and Demography Programme."
typeOf: dcs:StatisticalVariable
populationType: dcs:BirthEvent
statType: dcs:measuredValue
measuredProperty: dcs:count
measurementDenominator: dcs:Count_Person
medicalStatus: dcs:LiveBirth


Node: dcid:WorldBank/SP_DYN_LE00_FE_IN
name: "Life expectancy at birth, female (years)"
description: "Life expectancy at birth indicates the number of years a newborn infant would live if prevailing patterns of mortality at the time of its birth were to stay the same throughout its life. (1) United Nations Population Division. World Population Prospects: 2019 Revision. (2) Census reports and other statistical publications from national statistical offices, (3) Eurostat: Demographic Statistics, (4) United Nations Statistical Division. Population and Vital Statistics Report (various years), (5) U.S. Census Bureau: International Database, and (6) Secretariat of the Pacific Community: Statistics and Demography Programme."
typeOf: dcs:StatisticalVariable
populationType: dcs:Person
statType: dcs:measuredValue
measuredProperty: dcs:lifeExpectancy
gender: dcs:Female


Node: dcid:WorldBank/SP_DYN_LE00_MA_IN
name: "Life expectancy at birth, male (years)"
description: "Life expectancy at birth indicates the number of years a newborn infant would live if prevailing patterns of mortality at the time of its birth were to stay the same throughout its life. (1) United Nations Population Division. World Population Prospects: 2019 Revision. (2) Census reports and other statistical publications from national statistical offices, (3) Eurostat: Demographic Statistics, (4) United Nations Statistical Division. Population and Vital Statistics Report (various years), (5) U.S. Census Bureau: International Database, and (6) Secretariat of the Pacific Community: Statistics and Demography Programme."
typeOf: dcs:StatisticalVariable
populationType: dcs:Person
statType: dcs:measuredValue
measuredProperty: dcs:lifeExpectancy
gender: dcs:Male


Node: dcid:WorldBank/EG_ELC_FOSL_ZS
name: "Electricity production from oil, gas and coal sources (% of total)"
description: "Sources of electricity refer to the inputs used to generate electricity. Oil refers to crude oil and petroleum products. Gas refers to natural gas but excludes natural gas liquids. Coal refers to all coal and brown coal, both primary (including hard coal and lignite-brown coal) and derived fuels (including patent fuel, coke oven coke, gas coke, coke oven gas, and blast furnace gas). Peat is also included in this category. IEA Statistics OECD/IEA 2014 (http://www.iea.org/stats/index.asp), subject to https://www.iea.org/t&c/termsandconditions/"
typeOf: dcs:StatisticalVariable
populationType: dcs:Production
statType: dcs:measuredValue
measuredProperty: dcs:amount
measurementDenominator: dcs:Amount_Production_Energy
producedThing: dcs:ElectricityFromOilGasOrCoalSources


Node: dcid:WorldBank/EG_ELC_NUCL_ZS
name: "Electricity production from nuclear sources (% of total)"
description: "Sources of electricity refer to the inputs used to generate electricity. Nuclear power refers to electricity produced by nuclear power plants. IEA Statistics OECD/IEA 2014 (http://www.iea.org/stats/index.asp), subject to https://www.iea.org/t&c/termsandconditions/"
typeOf: dcs:StatisticalVariable
populationType: dcs:Production
statType: dcs:measuredValue
measuredProperty: dcs:amount
measurementDenominator: dcs:Amount_Production_Energy
producedThing: dcs:ElectricityFromNuclearSources


Node: dcid:WorldBank/EG_FEC_RNEW_ZS
name: "Renewable energy consumption (% of total final energy consumption)"
description: "Renewable energy consumption is the share of renewables energy in total final energy consumption. World Bank, Sustainable Energy for All (SE4ALL) database from the SE4ALL Global Tracking Framework led jointly by the World Bank, International Energy Agency, and the Energy Sector Management Assistance Program."
typeOf: dcs:StatisticalVariable
populationType: dcs:Consumption
statType: dcs:measuredValue
measuredProperty: dcs:amount
measurementDenominator: dcs:Amount_Consumption_Energy
consumedThing: dcs:RenewableEnergy


Node: dcid:WorldBank/EN_POP_EL5M_ZS
name: "Population living in areas where elevation is below 5 meters (% of total population)"
description: "Population below 5m is the percentage of the total population living in areas where the elevation is 5 meters or less. Center for International Earth Science Information Network (CIESIN)/Columbia University. 2013. Urban-Rural Population and Land Area Estimates Version 2. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://sedac.ciesin.columbia.edu/data/set/lecz-urban-rural-population-land-area-estimates-v2."
typeOf: dcs:StatisticalVariable
populationType: dcs:Person
statType: dcs:measuredValue
measuredProperty: dcs:count
measurementDenominator: dcs:Count_Person
residenceCharacteristic: dcs:LessThan5MetersAboveSeaLevel


Node: dcid:WorldBank/IT_CEL_SETS_P2
name: "Mobile cellular subscriptions (per 100 people)"
description: "Mobile cellular telephone subscriptions are subscriptions to a public mobile telephone service that provide access to the PSTN using cellular technology. The indicator includes (and is split into) the number of postpaid subscriptions, and the number of active prepaid accounts (i.e. that have been used during the last three months). The indicator applies to all mobile cellular subscriptions that offer voice communications. It excludes subscriptions via data cards or USB modems, subscriptions to public mobile data services, private trunked mobile radio, telepoint, radio paging and telemetry services. International Telecommunication Union, World Telecommunication/ICT Development Report and database."
typeOf: dcs:StatisticalVariable
populationType: dcs:Product
statType: dcs:measuredValue
measuredProperty: dcs:count
measurementDenominator: dcs:Count_Person
productType: dcs:MobileCellularSubscription


Node: dcid:WorldBank/SE_XPD_TERT_ZS
name: "Expenditure on tertiary education (% of government expenditure on education)"
description: "Expenditure on tertiary education is expressed as a percentage of total general government expenditure on education. General government usually refers to local, regional and central governments. UNESCO Institute for Statistics (http://uis.unesco.org/)"
typeOf: dcs:StatisticalVariable
populationType: dcs:EconomicActivity
statType: dcs:measuredValue
measuredProperty: dcs:amount
measurementDenominator: dcs:Amount_EconomicActivity_ExpenditureActivity_EducationExpenditure_Government
activitySource: dcs:ExpenditureActivity
expenditureType: dcs:TertiaryEducationExpenditure
remunerator: dcs:Government

Loading

0 comments on commit f88e1c0

Please sign in to comment.