Indian Districts for COVID19 #328

edumorlom · 2020-10-26T19:06:20Z

Import for Indian Districts and States from covid19india.org.
Each state has its own API.
Place names are resolved by wikidataId.

This change is

…covid19india

tjann

I'll review the script after getting a better idea from the current comments!

tjann · 2020-11-25T20:11:08Z

scripts/covid19indiaORG/INDIA_MAP.py

+   "Arunachal Pradesh":"Q1162",
+   "Andhra Pradesh":"Q1159",
+   "Andaman and Nicobar Islands":"Q40888"
+}


How did you come up with this map?

The places' dcids are resolved by wikidataId.
The map of State->District->wikidataId was generated using the following:

The wikidataId for each place is queried using the place_resolver.go script.

A script is used against wikidata.org/wiki/${wikidataId} that verifies that the place is both a District and part of India.

Manual check has been performed to ensure that the name matches.

I see. Does the data have any better IDs? This is fine if it's name only.

Do you have that code too? Should we check it in?

How many places did you manually check? Ballpark is fine.

@pradh do you have any opinions about manifesting this import to prod if we are using place name resolver to get the Wikidata IDs of Indian districts?

@tjann I have added the script that checks against wikidataId to ensure the wikidataId is correct.
It basically goes through all the wikidataIds and exports a CSV with wikidata name and whether it belongs to India.
Then it's very easy to manually check that all are correct. I added a README.md too.

This dataset does not have any IDs right? So the approach to mapping to wikidataId via some heuristics (including place name resolver) and manual check sounds reasonable...

scripts/covid19indiaORG/Config.py

scripts/covid19indiaORG/README.md

scripts/covid19indiaORG/Config.py

…covid19india

scripts/covid19indiaORG/main.py

…covid19india

…e name, now it uses state abbreviation instead for simplicity

…covid19india

edumorlom · 2020-12-02T21:37:58Z

Ready for re-review!

Added 3 unit tests, instead of API calls, it reads JSON from the file.
Minor modifications to code to allow testing.
Modifications to README.md.
I no longer use the State name, only State isoCode and district name.

scripts/covid19indiaORG/Covid19IndiaORG.mcf

scripts/covid19indiaORG/README.md

scripts/covid19indiaORG/import-config.txt

scripts/covid19indiaORG/run_tests.py

tjann · 2020-12-04T19:08:06Z

scripts/covid19indiaORG/run_tests.py

+        # Read the CSV file and generate a DataFrame with it.
+        actual_df = pd.read_csv(output_path)
+        expected_df = pd.read_csv(expected_path)
+
+        # Assert that both dataframes are equal, regardless of order and dtype.
+        pd.testing.assert_frame_equal(
+            actual_df.sort_index(axis=1),
+            expected_df.sort_index(axis=1),
+            check_dtype=False)


Can just do string checking instead of reading into pd dataframe

I can do that but then I have to ensure the column order is the same.

Oh I see. This is a result of using pd df's in the library? We can keep this then

Yeah it is but I changed it, now it always exports the columns by alphabetical order. So CSV files should be identical.

scripts/covid19indiaORG/Covid19IndiaORG.py

tjann · 2020-12-04T19:10:37Z

scripts/covid19indiaORG/Covid19IndiaORG.py

+        downloaded_data: Dict[str, Dict] = _download_data(data_source)
+
+        # If there is no wikidataId for the state, skip it.
+        if iso_code not in STATES:


Can you tell me about how much this happens? Maybe also leave a note in README

it doesn't really happen at all, it's just an edge case in case they add some other form of state and we don't have that state in the hashmap.

…covid19india

edumorlom · 2020-12-07T05:41:55Z

Ready for re-review!

I added the script that checks against wikidata to ensure that the wikidataIds are correct, contains README.md
Removed default parameters.
Tests will use StringIO.
Minor changes to code and documentation.

Thanks

…india

…covid19india

…india

scripts/covid19indiaORG/CheckWikidataId/CheckWikidataId.py

scripts/covid19indiaORG/Config.py

…covid19india

google-cla bot added the cla: yes label Oct 26, 2020

initial commit

c8b4623

edumorlom force-pushed the covid19india branch from 0e49568 to c8b4623 Compare October 26, 2020 20:55

Eduardo Morales added 2 commits October 27, 2020 10:22

Merge branch 'master' of https://github.com/datacommonsorg/data into …

6d2e3d8

…covid19india

adds import config

5b8997d

tjann reviewed Nov 25, 2020

View reviewed changes

Eduardo Morales added 2 commits November 25, 2020 15:42

Merge branch 'master' of https://github.com/datacommonsorg/data into …

7b006c0

…covid19india

fixes to first review

1fbe1eb

edumorlom requested a review from tjann November 25, 2020 21:49

tjann reviewed Nov 25, 2020

View reviewed changes

scripts/covid19indiaORG/main.py Outdated Show resolved Hide resolved

tjann reviewed Nov 25, 2020

View reviewed changes

scripts/covid19indiaORG/main.py Outdated Show resolved Hide resolved

Eduardo Morales added 4 commits December 1, 2020 19:42

Merge branch 'master' of https://github.com/datacommonsorg/data into …

490de4b

…covid19india

separates main method into functions for testing and gets rid of stat…

f116a94

…e name, now it uses state abbreviation instead for simplicity

Merge branch 'master' of https://github.com/datacommonsorg/data into …

a79fce9

…covid19india

adds testing

7534f60

edumorlom requested a review from tjann December 2, 2020 21:38

Eduardo Morales added 2 commits December 2, 2020 16:41

adds testing

9dbea8c

minor updates to README

61b82ca