-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check language metadata #385
Check language metadata #385
Conversation
…e/Scribe-Data into check_language_metadata
…e/Scribe-Data into check_language_metadata
Thank you for the pull request!The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :) If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you! Maintainer checklist |
LANGUAGE_DATA_EXTRACTION_DIR = Path(__file__).parent.parent / "language_data_extraction" | ||
|
||
LANGUAGE_METADATA_FILE = ( | ||
Path(__file__).parent.parent / "resources" / "language_metadata.json" | ||
) | ||
|
||
DATA_TYPE_METADATA_FILE = ( | ||
Path(__file__).parent.parent / "resources" / "data_type_metadata.json" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@catreedle you can get these from src/scribe_data/cli/cli_utils.py
- they already exist there
try: | ||
with LANGUAGE_METADATA_FILE.open("r", encoding="utf-8") as file: | ||
language_metadata = json.load(file) | ||
languages_in_metadata = { | ||
lang["language"]: {"iso": lang["iso"], "qid": lang["qid"]} | ||
for lang in language_metadata["languages"] | ||
} # current language metadata | ||
|
||
# languages_in_metadata = { # proposed language metadata | ||
# key.lower(): value for key, value in language_metadata.items() | ||
# } # Normalize keys to lowercase for case-insensitive comparison | ||
|
||
except (IOError, json.JSONDecodeError) as e: | ||
print(f"Error reading language metadata: {e}") | ||
|
||
try: | ||
with DATA_TYPE_METADATA_FILE.open("r", encoding="utf-8") as file: | ||
data_type_metadata = json.load(file) | ||
all_data_types = tuple(data_type_metadata.keys()) | ||
|
||
except (IOError, json.JSONDecodeError) as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the cli_utils.py
has loaded the language_metadata.json
and data_type_metadata.json
files for us. You could ignore loading it in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean we can directly use it? how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just call it directly. You can experiment and see for yourself 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thankss.. I found it 😊
if __name__ == "__main__": | ||
check_language_metadata() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is for a test, then it's fine. but we will be calling this in check_project_metadata.yaml
file so no need for this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you! good to know 😊
One minor thing here, @catreedle: Could we get consistent function docstirng, and ones similar to what we have in this PR in |
Making the docstring like that means we can use autodoc in the documentation 😊 |
hey @catreedle , I'm sorry if I have confused you a bit :( if __name__ == "__main__":
check_language_metadata() I just checked and the code you added earlier is quite important if we are gonna be using this file for our workflow. Can you please add it back? Ty! |
on it :) |
sure. no worries :) |
updated the docstring. let me know if there's amiss @andrewtavis :) |
Quick note being sent to all the testing PRs, if updates are needed now that #402 has been merged, then it'd be great to get those updates to the branch :) If no updates are needed, then let me know 😊 |
If any missing languages or properties are found, the function exits the script with a status code of 1. | ||
""" | ||
languages_in_metadata = {key.lower(): value for key, value in _languages.items()} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line
languages_in_metadata = {key.lower(): value for key, value in _languages.items()}
give all the languages in _languages
, including the sublanguages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. but it doesn't lowercase all the sublanguages if any is mistakenly written, I just noticed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is graet. There is also a helper function called list_all_languages
at the bottom of utils.py
that lists all the languages in the JSON file. You can make use of it here to check for sublanguages; it would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! Could you explain further how best to use it in this context? I want to include parent languages so I convert the languages directory to match the format of language_metadata.json
for easier comparison. Since list_all_languages
doesn’t show the parent languages, I’m unsure how to use it effectively for my comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bringing this down and integrating the checks for isos and qids into the other workflows 😊 Thanks, @catreedle! :) Really great to have the support on the checks 🚀
Thank you! @andrewtavis |
Contributor checklist
Description
This PR introduces
check_language_metadata.py
with the functionality:languages_metadata.json
andlanguage_data_extraction
.language_metadata.json
has the propertyqid
andiso
.This code has been tested locally.
Related issue