Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update languages metadata file and use of it thoughout project #293

Closed
2 tasks done
andrewtavis opened this issue Oct 9, 2024 · 16 comments
Closed
2 tasks done

Update languages metadata file and use of it thoughout project #293

andrewtavis opened this issue Oct 9, 2024 · 16 comments
Assignees
Labels
feature New feature or request hacktoberfest Included as a part of Hacktoberfest help wanted Extra attention is needed

Comments

@andrewtavis
Copy link
Member

Terms

Description

As of now the Scribe-Data CLI options are determined based on the language_metadata.json file. To make maintenance of the package easier, it would be great if the options of the CLI were instead determined by the directory structure of src/scribe_data/language_data_extraction so that the code doesn't need to be updated each time new queries are being added in.

Of key importance is also that the options of the CLI would allow for dialects as well, so for Norwegian we'd like to see Norwegian - Bokmål and Norwegian - Nynorsk, for example. How this will be achieved is open for discussion!

Contribution

Happy to discuss how best to read in dialect sub directories and review the changes here when the PR is up!

@andrewtavis andrewtavis added feature New feature or request help wanted Extra attention is needed hacktoberfest Included as a part of Hacktoberfest labels Oct 9, 2024
@OmarAI2003
Copy link
Contributor

I'm interested in this issue. 😃

@andrewtavis
Copy link
Member Author

This would be a really good one for you, @OmarAI2003 😊 Let us know if you have any questions!

@OmarAI2003
Copy link
Contributor

Replacing the dependency on language_metadata.json for getting the language names by using the language_data_extraction folder structure seems applicable. However, I’m not sure how to handle other properties in the JSON file like iso, qid, remove-words, etc.Would it make sense to include these properties somehow, or should we consider another approach? I'm not sure if this is right but I would love to get more input!

@andrewtavis
Copy link
Member Author

In talking about this a bit, @OmarAI2003, we might not be able to do this. @SethiShreya and I were talking and as you said we need the QIDs as well so that we can do calls for the CLI based on QIDs as well. Without a central store of languages and their QIDs, maybe it can't work?

@OmarAI2003
Copy link
Contributor

Maybe we could use the directory structure just for language names, but still keep language_metadata.json for properties like QIDs? Not sure if this would help, but happy to hear your thoughts!

@andrewtavis
Copy link
Member Author

Is an interesting idea, but then say that we rely on the structure and then we don't get a QID added and then some functionality is broken 🤔

@OmarAI2003
Copy link
Contributor

So is this issue will be closed , or is there anything that needs to be addressed?

@andrewtavis
Copy link
Member Author

I'm thinking that for this one we can convert the functionality of the languages metadata file? I don't think we need the header key for it or the "languages" key where all the leagues are? You can remove the header and put all the language objects at the top level. You can also remove all of the keys that aren't the language name, iso and qid? Then from there we need to rework the reference of this metadata file throughout the project and fix the tests 😇

How does this sound, @OmarAI2003? :)

@andrewtavis andrewtavis changed the title Update Scribe-Data CLI to read in options from the directory structure Update languages metadata file and use of it thoughout project Oct 11, 2024
@OmarAI2003
Copy link
Contributor

Sounds nice @andrewtavis, but I will need to engage in several discussions here and there along the way to make sure I'm on the same page.

@andrewtavis
Copy link
Member Author

Sure thing, @OmarAI2003! Just start with getting the file down to just objects with languages, ISO-2s and QIDs at the base level, and then we can discuss from there. Happy to help as needed!

@catreedle
Copy link
Contributor

I'm thinking that for this one we can convert the functionality of the languages metadata file? I don't think we need the header key for it or the "languages" key where all the leagues are? You can remove the header and put all the language objects at the top level. You can also remove all of the keys that aren't the language name, iso and qid? Then from there we need to rework the reference of this metadata file throughout the project and fix the tests 😇

How does this sound, @OmarAI2003? :)

hi @andrewtavis
with the languages header removed, what will the full language_metadata file look like?

@andrewtavis
Copy link
Member Author

@OmarAI2003, can you send along a snippet of the current version of the file so we can all take a look? :)

@OmarAI2003
Copy link
Contributor

OmarAI2003 commented Oct 15, 2024

This is the current version of the JSON file. I'm telling you not to worry about the sub-languages file path because there will be a format_sublanguage_name function in utils.py that will provide the name of the language to get the name of it relative to its directory. For example, a Norwegian sub-language like 'Bokmål', when called within the function format_sublanguage_name(Bokmål, language_metadata), will return the language directory capitalized like Norwegian/Bokmål, and normal languages will be returned as it is but capitalized. There will also be a list_all_languages function for listing all queryable languages and sub-languages.
language_metadata.json

@catreedle
Copy link
Contributor

language_metadata.json

Thank you. Sounds great 😃

@andrewtavis
Copy link
Member Author

Closed by #402 :) Thanks for the great work @OmarAI2003 and for the great conversation all!

@github-project-automation github-project-automation bot moved this from Todo to Done in Scribe Board Oct 18, 2024
@OmarAI2003
Copy link
Contributor

Closed by #402 :) Thanks for the great work @OmarAI2003 and for the great conversation all!

You're welcome! It was a great experience working on this, and I appreciate all the valuable feedback and discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request hacktoberfest Included as a part of Hacktoberfest help wanted Extra attention is needed
Projects
Archived in project
Development

No branches or pull requests

3 participants