Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

complete workflow to check sparql queries #396

Merged
merged 65 commits into from
Oct 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
ad54e29
complete workflow to check sparql queries
DeleMike Oct 16, 2024
5faa2f4
add function call to check queries
DeleMike Oct 16, 2024
c9c50d9
update check_query_identifiers workflow file: activate virtual enviro…
DeleMike Oct 16, 2024
1e04e4b
add working directory
DeleMike Oct 16, 2024
97f3243
update workflow: fix file path
DeleMike Oct 16, 2024
2ee16bb
reduce dependencies
DeleMike Oct 16, 2024
92e4ad9
add pythonpath dependencies
DeleMike Oct 16, 2024
042958e
add workflow fix
DeleMike Oct 17, 2024
ac4a2ba
Add Finnish verbs query
Ebeleokolo Oct 17, 2024
ee5b034
Updates to Finnish verbs query
andrewtavis Oct 17, 2024
3b9a61a
throw error if invalid QIDs are found
DeleMike Oct 17, 2024
10e7a50
post comment if workflow fails
DeleMike Oct 17, 2024
1d6668b
fix async block in workflow
DeleMike Oct 17, 2024
2cdcc01
give gh actions write access
DeleMike Oct 17, 2024
eb0e3f2
remove pr comment steps
DeleMike Oct 17, 2024
0a2d574
Added Swedish Adjectives
GicharuElvis Oct 17, 2024
8f3425a
Create query_verbs.sparql
Otom-obhazi Oct 17, 2024
5ffafb0
Add Igbo to the languages check
andrewtavis Oct 17, 2024
cac8dd6
Remove label service from adjectives query
andrewtavis Oct 17, 2024
34d84d2
Update query_adverbs.sparql
Otom-obhazi Oct 17, 2024
b5be3e6
Remove forms that were accidentally added
andrewtavis Oct 17, 2024
ca119c9
Minor changes to unicode setup docs
andrewtavis Oct 17, 2024
3ee79ab
Minor header change to unicode docs headers
andrewtavis Oct 17, 2024
6620ec5
Simplified language metadata JSON by removing unnecessary nesting and…
OmarAI2003 Oct 12, 2024
8666c02
Refactored _load_json function to handle simplified JSON structure.
OmarAI2003 Oct 12, 2024
3dce46d
Refactor language metadata structure: Include all languages with Norw…
OmarAI2003 Oct 12, 2024
5b51483
Refactor _find function to handle languages with sub-languages
OmarAI2003 Oct 12, 2024
a68b08c
Update get_scribe_languages to handle sub-languages in JSON structure
OmarAI2003 Oct 12, 2024
d447698
Remove get_language_words_to_remove and get_language_words_to_ignore …
OmarAI2003 Oct 13, 2024
86cd59d
Refactor language_map and language_to_qid generation to handle new JS…
OmarAI2003 Oct 13, 2024
d53ce37
Fix: Update language extraction to match new JSON structure by removi…
OmarAI2003 Oct 13, 2024
e8d82d0
Refactor language extraction to use direct keys from language_metadata.
OmarAI2003 Oct 13, 2024
5cd6087
Added format_sublanguage_name function to format sub-language names a…
OmarAI2003 Oct 14, 2024
74d7f47
Refactor: Apply format_sublanguage_name to handle sub-language
OmarAI2003 Oct 14, 2024
51e847d
Removed dependency on the 'languages' key based on the old json struc…
OmarAI2003 Oct 14, 2024
4c8fe1e
Add function to list all languages from language metadata loaded json
OmarAI2003 Oct 14, 2024
1fdb703
Refactor to use list_all_languages function for language extraction
OmarAI2003 Oct 14, 2024
4e50cbb
Enhance language handling by importing utility functions
OmarAI2003 Oct 14, 2024
761f8ee
Update get_language_iso function:
OmarAI2003 Oct 14, 2024
bc65e0d
Handle sub-languages in language table generation
OmarAI2003 Oct 14, 2024
47ff4f8
adding new languages and their dialects to the language_metadata.json…
OmarAI2003 Oct 14, 2024
f1f8928
Modified the loop that searches languages in the list_data_types func…
OmarAI2003 Oct 14, 2024
5a4f721
Capitalize the languages returned by the function 'format_sublanguage…
OmarAI2003 Oct 14, 2024
eaf89e4
Implemented minor fixes by utilizing the format_sublanguage_name func…
OmarAI2003 Oct 14, 2024
661d723
Updated the instance variable self.languages in ScribeDataConfig to u…
OmarAI2003 Oct 15, 2024
dffb9f7
adding mandarin as a sub language under chinese and updating some qids
OmarAI2003 Oct 16, 2024
4a204c0
Update test_list_languages to match updated output format
OmarAI2003 Oct 16, 2024
0249c96
removing .capitalize method since it's already implemented inside lag…
OmarAI2003 Oct 16, 2024
a584749
Updating test cases in test_list.py file to match newly added languages
OmarAI2003 Oct 16, 2024
4ef0c22
Update test cases to include sub-languages
OmarAI2003 Oct 16, 2024
775fb24
Updated the get_language_from_iso function to depend on the JSON file…
OmarAI2003 Oct 16, 2024
0b75b4e
Add unit tests for language formatting and listing:
OmarAI2003 Oct 16, 2024
ad61c66
Edits to language metadata and supporting functions + pr checklist
andrewtavis Oct 18, 2024
3fe5528
Refactor language_map and language_to_qid generation to handle new JS…
OmarAI2003 Oct 13, 2024
efb1f64
removing .capitalize method since it's already implemented inside lag…
OmarAI2003 Oct 16, 2024
048c84f
adjust is_valid_language function to suit new JSON structure
DeleMike Oct 18, 2024
1f8c9da
Refactor language_map and language_to_qid generation to handle new JS…
OmarAI2003 Oct 13, 2024
f1e227f
Refactor language_map and language_to_qid generation to handle new JS…
OmarAI2003 Oct 13, 2024
18d0747
Merge branch 'main' into fix/adjust-check-query-workflow
DeleMike Oct 18, 2024
d814ecb
fix failing tests and update docs
DeleMike Oct 18, 2024
c8214ff
fix failing workflow: add languages to workflow and update failing te…
DeleMike Oct 19, 2024
6517ffe
fix failing tests
DeleMike Oct 19, 2024
2b9b1e1
Merge branch 'main' into fix/adjust-check-query-workflow
andrewtavis Oct 19, 2024
8586625
Add Latvian to language metadata file
andrewtavis Oct 19, 2024
a975a6b
Add spacing and Latvian to testing
andrewtavis Oct 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 23 additions & 21 deletions .github/workflows/check_query_identifiers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,24 +22,26 @@ jobs:
name: Run Check Query Identifiers

steps:
- name: Checkout
uses: actions/checkout@v3

# - name: Set up Python ${{ matrix.python-version }}
# uses: actions/setup-python@v4
# with:
# python-version: ${{ matrix.python-version }}

# - name: Install dependencies
# run: |
# python -m pip install --upgrade uv
# uv venv
# uv pip install -r requirements.txt

# - name: Activate virtualenv
# run: |
# . .venv/bin/activate
# echo PATH=$PATH >> $GITHUB_ENV

# - name: Run Python script
# run: python src/scribe_data/check/check_query_identifiers.py
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Add project root to PYTHONPATH
run: echo "PYTHONPATH=$(pwd)/src" >> $GITHUB_ENV

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Run check_query_identifiers.py
working-directory: ./src/scribe_data/check
run: python check_query_identifiers.py

- name: Post-run status
if: failure()
run: echo "Project SPARQL queries check failed. Please fix the reported errors."
50 changes: 33 additions & 17 deletions src/scribe_data/check/check_query_identifiers.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
"""

import re
import sys
from pathlib import Path

from scribe_data.cli.cli_utils import (
Expand All @@ -50,6 +51,11 @@ def extract_qid_from_sparql(file_path: Path, pattern: str) -> str:
-------
str
The extracted QID if found, otherwise None.

Raises
------
FileNotFoundError
If the specified file does not exist.
"""
try:
with open(file_path, "r", encoding="utf-8") as file:
Expand All @@ -63,7 +69,7 @@ def extract_qid_from_sparql(file_path: Path, pattern: str) -> str:
return None


def check_queries():
def check_queries() -> None:
"""
Validates SPARQL queries in the specified directory to check for correct language
and data type QIDs.
Expand Down Expand Up @@ -92,14 +98,14 @@ def check_queries():
for file in incorrect_languages:
print(f"- {file}")

print("\n----------------------------------------------------------------\n")

if incorrect_data_types:
print("Incorrect Data Type QIDs found in the following files:")
for file in incorrect_data_types:
print(f"- {file}")

print("\n----------------------------------------------------------------\n")
# Exit with an error code if any incorrect QIDs are found.
if incorrect_languages or incorrect_data_types:
sys.exit(1)


def is_valid_language(query_file: Path, lang_qid: str) -> bool:
Expand All @@ -117,24 +123,30 @@ def is_valid_language(query_file: Path, lang_qid: str) -> bool:
-------
bool
True if the language QID is valid, otherwise False.

Example
-------
> is_valid_language(Path("path/to/query.sparql"), "Q123456")
True
"""
lang_directory_name = query_file.parent.parent.name.lower()
languages = language_metadata.get(
"languages"
) # might not work since language_metadata file is not fully updated
language_entry = next(
(lang for lang in languages if lang["language"] == lang_directory_name), None
)
language_entry = language_metadata.get(lang_directory_name)

if not language_entry:
# Look for sub-languages
for lang, details in language_metadata.items():
if "sub_languages" in details:
sub_language_entry = details["sub_languages"].get(lang_directory_name)
if sub_language_entry:
language_entry = sub_language_entry
break

if not language_entry:
return False

expected_language_qid = language_entry["qid"]

if lang_qid != expected_language_qid:
return False

return True
return lang_qid == expected_language_qid


def is_valid_data_type(query_file: Path, data_type_qid: str) -> bool:
Expand All @@ -152,13 +164,17 @@ def is_valid_data_type(query_file: Path, data_type_qid: str) -> bool:
-------
bool
True if the data type QID is valid, otherwise False.

Example
-------
> is_valid_data_type(Path("path/to/query.sparql"), "Q654321")
True
"""
directory_name = query_file.parent.name # e.g., "nouns" or "verbs"
expected_data_type_qid = data_type_metadata.get(directory_name)

return data_type_qid == expected_data_type_qid


# Run the check_queries function
# MARK: TODO: Remove Call
# check_queries()
if __name__ == "__main__":
check_queries()
1 change: 0 additions & 1 deletion src/scribe_data/cli/cli_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@
except (IOError, json.JSONDecodeError) as e:
print(f"Error reading data type metadata: {e}")


language_map = {}
language_to_qid = {}

Expand Down
36 changes: 18 additions & 18 deletions src/scribe_data/cli/list.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
get_language_iso,
get_language_qid,
list_all_languages,
list_languages_with_metadata_for_data_type,
)


Expand Down Expand Up @@ -132,28 +133,27 @@ def list_languages_for_data_type(data_type: str) -> None:
The data type to check for.
"""
data_type = correct_data_type(data_type=data_type)
all_languages = list_all_languages(language_metadata)
available_languages = []
for lang in all_languages:
lang = format_sublanguage_name(lang, language_metadata)
language_dir = LANGUAGE_DATA_EXTRACTION_DIR / lang
if language_dir.is_dir():
dt_path = language_dir / data_type
if dt_path.exists():
available_languages.append(lang)

available_languages.sort()
table_header = f"Available languages: {data_type}"
table_line_length = max(
len(table_header), max(len(lang) for lang in available_languages)
)
all_languages = list_languages_with_metadata_for_data_type(language_metadata)

# Set column widths for consistent formatting.
language_col_width = max(len(lang["name"]) for lang in all_languages) + 2
iso_col_width = max(len(lang["iso"]) for lang in all_languages) + 2
qid_col_width = max(len(lang["qid"]) for lang in all_languages) + 2

table_line_length = language_col_width + iso_col_width + qid_col_width

# Print table header.
print()
print(table_header)
print(
f"{'Language':<{language_col_width}} {'ISO':<{iso_col_width}} {'QID':<{qid_col_width}}"
)
print("-" * table_line_length)

for lang in available_languages:
print(f"{lang}")
# Iterate through the list of languages and format each row.
for lang in all_languages:
print(
f"{lang['name'].capitalize():<{language_col_width}} {lang['iso']:<{iso_col_width}} {lang['qid']:<{qid_col_width}}"
)

print("-" * table_line_length)
print()
Expand Down
8 changes: 8 additions & 0 deletions src/scribe_data/resources/language_metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,10 @@
"iso": "ja",
"qid": "Q5287"
},
"korean": {
"iso": "ko",
"qid": "Q9176"
},
"kurmanji": {
"iso": "kmr",
"qid": "Q36163"
Expand All @@ -103,6 +107,10 @@
"iso": "la",
"qid": "Q397"
},
"latvian": {
"iso": "lv",
"qid": "Q9078"
},
"malay": {
"iso": "ms",
"qid": "Q9237"
Expand Down
33 changes: 33 additions & 0 deletions src/scribe_data/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -546,3 +546,36 @@ def list_all_languages(language_metadata=_languages):
current_languages.append(lang_key)

return sorted(current_languages)


def list_languages_with_metadata_for_data_type(language_metadata=_languages):
"""
Returns a sorted list of languages and their metadata (name, iso, qid) for a specific data type.
The list includes sub-languages where applicable.
"""
current_languages = []

# Iterate through the language metadata.
for lang_key, lang_data in language_metadata.items():
# Check if there are sub-languages.
if "sub_languages" in lang_data:
# Add the sub-languages to current_languages with metadata.
for sub_key, sub_data in lang_data["sub_languages"].items():
current_languages.append(
{
"name": f"{lang_data.get('name', lang_key)}/{sub_data.get('name', sub_key)}",
"iso": sub_data.get("iso", ""),
"qid": sub_data.get("qid", ""),
}
)
else:
# If no sub-languages, add the main language with metadata.
current_languages.append(
{
"name": lang_data.get("name", lang_key),
"iso": lang_data.get("iso", ""),
"qid": lang_data.get("qid", ""),
}
)

return sorted(current_languages, key=lambda x: x["name"])
2 changes: 2 additions & 0 deletions tests/load/test_update_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,8 +154,10 @@ def test_list_all_languages():
"indonesian",
"italian",
"japanese",
"korean",
"kurmanji",
"latin",
"latvian",
"malay",
"malayalam",
"mandarin",
Expand Down
Loading