-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Script to Check Consistency Between Data Types in Directories and Metadata #390
Add Script to Check Consistency Between Data Types in Directories and Metadata #390
Conversation
Thank you for the pull request!The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :) If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you! Maintainer checklist |
if extra_data_types: | ||
discrepancies.append(f"Extra in directory for '{meta_language}': {extra_data_types}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct terms would be "Extra in metadata" or "Missing in directory"?
this is the result for English:
Extra in directory for 'english': {'conjunctions', 'pronouns', 'articles', 'postpositions', 'personal_pronouns', 'autosuggestions', 'prepositions'}
but the English directory doesn't have them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @catreedle but extra data here denotes any data type that is there in the language folder but not in metadata file
sub_lang_dir = language / 'sub-languages' | ||
if sub_lang_dir.exists(): | ||
discrepancies.extend(check_language_subdirs(sub_lang_dir, meta_language)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my thoughts here is how #402 will affect this PR...🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see...sub_lang_dir = language / 'sub-languages'
is written to even support the new flow coming from #402, yeah? @catreedle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it was done keeping the generalisation in mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont get it. does this mean the directory structure will change?
Norwegian
--sub-languages
----Nynorks
----Bokmal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont get it. does this mean the directory structure will change? Norwegian --sub-languages ----Nynorks ----Bokmal
No, @catreedle, it will remain as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean the directory structure will change?
No, the directory structure will remain same, but the scripts will be able to check the data types under sub languages also
Co-authored-by: Akindele Michael <[email protected]>
Quick note being sent to all the testing PRs, if updates are needed now that #402 has been merged, then it'd be great to get those updates to the branch :) If no updates are needed, then let me know 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bringing this down and integrating it into the other checks. Thanks, @KesharwaniArpita!
Contributor checklist
Description
This PR introduces a new function,
check_data_type_metadata
, which ensures that data type subdirectories within language directories are accurately reflected in thedata_type_metadata.json
file. It accounts for meta-languages and compares the data types found in the file system against those in the metadata, flagging any discrepancies such as missing or extra data types.Key Changes:
New Functionality:
check_data_type_metadata(output_file)
: This function traverses theLANGUAGE_DATA_EXTRACTION_DIR
to validate the consistency of data type directories against thedata_type_metadata.json
.output_file
.Helper Function:
check_language_subdirs
: A recursive function to handle meta-languages and sub-language directories, ensuring all subdirectories are accounted for during validation.Discrepancy Reporting:
Future Scope:
Related issue