Add Feature to Extract and Verify All Grammatical Features for a Data Type in a Given Language #513

OmarAI2003 · 2024-11-22T15:15:51Z

Terms

I have searched open and closed feature requests
I agree to follow Scribe-Data's Code of Conduct

Description

The language_data_extraction directory organizes supported languages into folders, with each language folder containing subfolders for supported data types (e.g., nouns, verbs, adverbs). Within these subfolders, SPARQL files are used to fetch lexical data for grammatical features. One way to enhance the data extraction process is to implement a mechanism that tracks the forms for each data type directly from Wikidata.

Problem Statement

Currently, we face two key challenges:

Listing all possible grammatical features for a given data type in a specific language (e.g., all forms that nouns or verbs can take).
Verifying that our SPARQL queries account for all these grammatical features, which could lead to incomplete or inconsistent data extraction if not addressed.

Addressing these challenges is essential for accurately capturing all forms of a data type across languages, ultimately improving data quality and consistency.

Contribution

No response

The text was updated successfully, but these errors were encountered:

andrewtavis · 2024-11-22T18:25:05Z

Thanks for making the issue, @OmarAI2003! I'll have more information on this in the coming weeks :)

andrewtavis · 2025-01-05T23:34:41Z

@axif0, now that we have the all forms functionality for Wikidata lexeme dumps, do you want to start working on the check for this? Basically we'd want a check that gets all the forms for all languages and then compares them against what we have in the queries. If the queries are missing forms, when we'd throw an error 😊 Ideally we'd have this also be able to be triggered manually.

axif0 · 2025-01-06T15:20:08Z

If the queries are missing forms, when we'd throw an error

Thank you for bringing this up! We can start working on the check for this functionality. To clarify, are you suggesting that if any forms are missing in the queries, we should throw an error rather than just issuing a warning?

andrewtavis · 2025-01-06T15:38:57Z

I would say that ideally what would come from this is a GitHub workflow that would actually error and on error open a PR with the corrected query with the missing forms. That way the work of actually writing the queries is taken care of for us and we can just review when the queries are written 😊

axif0 · 2025-01-06T15:57:32Z

Automating the process with a GitHub workflow that not only identifies the missing forms but also opens a PR with the corrected queries would indeed save a lot of time and effort. Working on it!

OmarAI2003 added the feature New feature or request label Nov 22, 2024

andrewtavis added this to Scribe Board Nov 22, 2024

andrewtavis moved this to Todo in Scribe Board Nov 22, 2024

andrewtavis mentioned this issue Dec 15, 2024

Added download cli cmd #528

Merged

2 tasks

andrewtavis added the help wanted Extra attention is needed label Jan 5, 2025

andrewtavis mentioned this issue Jan 5, 2025

Develop a method to ignore certain OPTIONAL selections in query checks #543

Open

2 tasks

andrewtavis assigned axif0 Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Feature to Extract and Verify All Grammatical Features for a Data Type in a Given Language #513

Add Feature to Extract and Verify All Grammatical Features for a Data Type in a Given Language #513

OmarAI2003 commented Nov 22, 2024

andrewtavis commented Nov 22, 2024

andrewtavis commented Jan 5, 2025

axif0 commented Jan 6, 2025

andrewtavis commented Jan 6, 2025

axif0 commented Jan 6, 2025 •

edited

Loading

Add Feature to Extract and Verify All Grammatical Features for a Data Type in a Given Language #513

Add Feature to Extract and Verify All Grammatical Features for a Data Type in a Given Language #513

Comments

OmarAI2003 commented Nov 22, 2024

Terms

Description

Problem Statement

Contribution

andrewtavis commented Nov 22, 2024

andrewtavis commented Jan 5, 2025

axif0 commented Jan 6, 2025

andrewtavis commented Jan 6, 2025

axif0 commented Jan 6, 2025 • edited Loading

axif0 commented Jan 6, 2025 •

edited

Loading