Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update WorldBank WDI scripts #963

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Conversation

ajaits
Copy link
Contributor

@ajaits ajaits commented Jan 8, 2024

Update WorldBank WDi scripts with the following:

  • add units for SVs that are 'per 1000 *'
  • changes for deprecated df.append() calls


Node: dcid:WorldBank/VC_IHR_PSRC_P5
name: "Intentional homicides (per 100,000 people)"
description: "Intentional homicides are estimates of unlawful homicides purposely inflicted as a result of domestic disputes, interpersonal violence, violent conflicts over land resources, intergang violence over turf or control, and predatory violence and killing by armed groups. Intentional homicide does not include all intentional killing; the difference is usually in the organization of the killing. Individuals or small groups usually commit homicide, whereas killing in armed conflict is usually committed by fairly cohesive groups of up to several hundred members and is thus usually excluded. UN Office on Drugs and Crime's International Homicide Statistics database."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas why these show up?

observationPeriod: "P1Y"
observationAbout: C:WorldBank->ISO3166Alpha3
value: C:WorldBank->Value1
unit: C:WorldBank->unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be that we don't want to check-in the full .CSV?

@@ -124,8 +124,9 @@ def read_worldbank(iso3166alpha3, fetchFromSource):
if df is None:
df = pd.DataFrame(columns=cols)
else:
df = df.append(pd.DataFrame([cols], columns=df.columns),
ignore_index=True)
# df = df.append(pd.DataFrame([cols], columns=df.columns),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed df.append() calls as the Api is deprecated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we delete instead of comment the old line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, deleted now. was commented for testing.

@ajaits ajaits marked this pull request as draft January 8, 2024 20:43
@ajaits
Copy link
Contributor Author

ajaits commented Jan 8, 2024

Working on the more script changes as it needs more changes fr pandas update

@pradh pradh requested a review from jehangiramjad January 9, 2024 19:31
Copy link
Contributor

@jehangiramjad jehangiramjad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added some comments. The main point of concern is that some new SVs are being introduced but they already seem to exist. So I am a bit unsure what's going on?

The thing I am a bit confused by as a result is that none of the DCIDs in this file (scripts/world_bank/wdi/output/WorldBank_StatisticalVariables.mcf```) can be found and I find the corresponding SVs here (third_party/datacommons/schema/stat_vars/manual_wdi_stat_vars.mcf``). So I am not sure what's going on between these DCID mappings?

@@ -53,7 +53,6 @@ CM.MKT.LCAP.CD,,,Market capitalization of listed domestic companies (current US$
BX.TRF.PWKR.DT.GD.ZS,,,"Personal remittances, received (% of GDP)","Personal remittances comprise personal transfers and compensation of employees. Personal transfers consist of all current transfers in cash or in kind made or received by resident households to or from nonresident households. Personal transfers thus include all current transfers between resident and nonresident individuals. Compensation of employees refers to the income of border, seasonal, and other short-term workers who are employed in an economy where they are not resident and of residents employed by nonresident entities. Data are the sum of two items defined in the sixth edition of the IMF's Balance of Payments Manual: personal transfers and compensation of employees.","World Bank staff estimates based on IMF balance of payments data, and World Bank and OECD GDP estimates.",Remittance,measuredValue,amount,transferType,InwardRemittance,,,,,Amount_EconomicActivity_GrossDomesticProduction_Nominal,100,,WorldBankEstimate,
BX.TRF.PWKR.CD.DT,,,"Personal remittances, received (current US$)","Personal remittances comprise personal transfers and compensation of employees. Personal transfers consist of all current transfers in cash or in kind made or received by resident households to or from nonresident households. Personal transfers thus include all current transfers between resident and nonresident individuals. Compensation of employees refers to the income of border, seasonal, and other short-term workers who are employed in an economy where they are not resident and of residents employed by nonresident entities. Data are the sum of two items defined in the sixth edition of the IMF's Balance of Payments Manual: personal transfers and compensation of employees. Data are in current U.S. dollars.",World Bank staff estimates based on IMF balance of payments data.,Remittance,measuredValue,amount,transferType,InwardRemittance,,,,,,,,WorldBankEstimate,USDollar
BM.TRF.PWKR.CD.DT,,,"Personal remittances, paid (current US$)","Personal remittances comprise personal transfers and compensation of employees. Personal transfers consist of all current transfers in cash or in kind made or received by resident households to or from nonresident households. Personal transfers thus include all current transfers between resident and nonresident individuals. Compensation of employees refers to the income of border, seasonal, and other short-term workers who are employed in an economy where they are not resident and of residents employed by nonresident entities. Data are the sum of two items defined in the sixth edition of the IMF's Balance of Payments Manual: personal transfers and compensation of employees. Data are in current U.S. dollars.","World Bank staff estimates based on IMF balance of payments data, and World Bank and OECD GDP estimates.",Remittance,measuredValue,amount,transferType,OutwardRemittance,,,,,,,,WorldBankEstimate,USDollar
VC.IHR.PSRC.P5,,,"Intentional homicides (per 100,000 people)","Intentional homicides are estimates of unlawful homicides purposely inflicted as a result of domestic disputes, interpersonal violence, violent conflicts over land resources, intergang violence over turf or control, and predatory violence and killing by armed groups. Intentional homicide does not include all intentional killing; the difference is usually in the organization of the killing. Individuals or small groups usually commit homicide, whereas killing in armed conflict is usually committed by fairly cohesive groups of up to several hundred members and is thus usually excluded.",UN Office on Drugs and Crime's International Homicide Statistics database.,CriminalActivities,measuredValue,count,crimeType,MurderAndNonNegligentManslaughter,,,,,Count_Person,,100000,,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why this was deleted?

@@ -634,3 +634,163 @@ statType: dcs:measuredValue
measuredProperty: dcs:amount
transferType: dcs:OutwardRemittance


Node: dcid:WorldBank/SH_DYN_MORT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for these additions across the file? See a couple of comments below where it seems that the SV DCIDs are getting renamed but the SV and its contents are the same. Any idea what happened here? We should keep the SV DCIDs as checked in under google3/third_party/datacommons/schema/stat_vars/manual_wdi_stat_vars.mcf

@@ -634,3 +634,163 @@ statType: dcs:measuredValue
measuredProperty: dcs:amount
transferType: dcs:OutwardRemittance


Node: dcid:WorldBank/SH_DYN_MORT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as the existing SV: dcid:MortalityRate_Person_Upto4Years_AsFractionOf_Count_BirthEvent_LiveBirth If so, then why the change?

age: dcs:YearsUpto4


Node: dcid:WorldBank/SH_PRV_SMOK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above for this one. The existing SV is this: dcid:Count_Person_15OrMoreYears_Smoking_AsFractionOf_Count_Person_15OrMoreYears

You can find this with code search in the file third_party/datacommons/schema/stat_vars/manual_wdi_stat_vars.mcf

'measurementMethod', 'measurementDenominator', 'scalingFactor',
'sourceScalingFactor', 'unit'
'measurementMethod',
#'measurementDenominator',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are commenting these out, wouldn't it be better to just remove/delete the lines?

worldbank_dataframe = worldbank_dataframe.append(country_df)
# Add new table to main dataframe.
wb_dfs.append(country_df)
# COmbine tables to get a single dataframe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "combine" (small "o")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants