-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add python script to update static tech docs #3933
base: master
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe pull request introduces a new script, Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- modules/mediawiki/files/bin/techdocs.py (1 hunks)
Additional context used
Ruff
modules/mediawiki/files/bin/techdocs.py
44-44: Loop control variable
page_id
not used within loop bodyRename unused
page_id
to_page_id
(B007)
Additional comments not posted (1)
modules/mediawiki/files/bin/techdocs.py (1)
52-66
: Functionfetch_page_content
implemented correctlyThe function properly retrieves wikitext content for a given page title using the MediaWiki API, handling HTTP errors appropriately.
no_space_after = (',', ';', ':', '(', '[', '{', '-', '_') | ||
|
||
if not current_line.endswith(no_space_after) and not current_line.endswith(' '): | ||
current_line += ' ' | ||
|
||
anchor_replacements = { | ||
'.28': '', | ||
'.29': '', | ||
|
||
'(': '', | ||
')': '', | ||
'?': '', | ||
'!': '', | ||
':': '', | ||
'/': '', | ||
'.': '', | ||
',': '', | ||
'"': '', | ||
"'": '', | ||
'_': '-', | ||
' ': '-', | ||
} | ||
|
||
if target.startswith('#'): | ||
# Anchor link | ||
anchor = target | ||
for old, new in anchor_replacements.items(): | ||
anchor = anchor.replace(old, new) | ||
anchor = anchor.lower() | ||
current_line += f'[{label}]({anchor})' | ||
else: | ||
if '#' in target: | ||
# Anchor link | ||
base_title, anchor = target.split('#', 1) | ||
else: | ||
base_title, anchor = target, None | ||
|
||
formatted_title = base_title.replace(' ', '_').replace('/', '-') | ||
local_file_path = os.path.join(ensure_sub_directory(), f'{formatted_title}.md') | ||
|
||
# Case-insensitive file existence check | ||
if file_exists_case_insensitive(local_file_path): | ||
# If the local file exists, use the local markdown link | ||
formatted_url = f"/tech-docs/{formatted_title.replace(':', '').lower()}" | ||
|
||
# Append anchor if it exists | ||
if anchor: | ||
for old, new in anchor_replacements.items(): | ||
anchor = anchor.replace(old, new) | ||
anchor = anchor.lower() | ||
formatted_url += f'#{anchor}' | ||
|
||
# Add the final link | ||
current_line += f'[{label}]({formatted_url})' | ||
else: | ||
# Otherwise, link to Miraheze | ||
if is_category: | ||
# Use a new section with a list for categories | ||
if is_first_category: | ||
if current_line: | ||
markdown_lines.append(current_line.strip()) | ||
current_line = '' | ||
markdown_lines.append('## Categories\n') | ||
is_first_category = False | ||
current_line += '* ' | ||
current_line += f"[{label}](https://meta.miraheze.org/wiki/{target.replace(' ', '_')})" | ||
|
||
elif isinstance(node, mwparserfromhell.nodes.Comment): | ||
if current_line: | ||
markdown_lines.append(current_line.strip()) | ||
current_line = '' | ||
|
||
if '<!--T:' not in str(node): | ||
markdown_lines.append(f'\n{str(node).strip()}\n') | ||
|
||
elif isinstance(node, mwparserfromhell.nodes.Template): | ||
if current_line: | ||
markdown_lines.append(current_line.strip()) | ||
current_line = '' | ||
|
||
# Keep templates intact in Markdown | ||
if '\n' in str(node): | ||
markdown_lines.append(f'```\n{{{{ {node} }}}}\n```') | ||
elif 'tech navigation' in str(node).lower() or 'hatnote' in str(node).lower(): | ||
markdown_lines.append(f'\n`{{{{ {node} }}}}`') | ||
else: | ||
current_line += f'`{{{{ {node} }}}}`' | ||
|
||
elif isinstance(node, mwparserfromhell.nodes.HTMLEntity): | ||
current_line += str(node) | ||
|
||
else: | ||
current_line += str(node) | ||
|
||
# Flush any remaining inline content to markdown_lines | ||
if current_line: | ||
markdown_lines.append(current_line.strip()) | ||
|
||
return clean_markdown(markdown_lines) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider refactoring convert_wikitext_to_markdown
for improved maintainability
The convert_wikitext_to_markdown
function is lengthy and complex, spanning nearly 300 lines. Refactoring it into smaller, modular functions can enhance readability, make it easier to test, and improve maintainability. Separating the handling of different node types or processing steps into individual functions would adhere to the Single Responsibility Principle.
def update_local_repo(): | ||
if not os.path.exists(LOCAL_REPO_PATH): | ||
Repo.clone_from(GITHUB_REPO_URL, LOCAL_REPO_PATH) | ||
else: | ||
repo = Repo(LOCAL_REPO_PATH) | ||
repo.git.pull() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance error handling for Git operations
The update_local_repo
function performs Git operations without error handling. If repo.git.pull()
encounters issues such as merge conflicts or network errors, the script may terminate unexpectedly. Adding try-except blocks around Git commands can help handle exceptions gracefully and improve the script's robustness.
def commit_and_push_changes(): | ||
repo = Repo(LOCAL_REPO_PATH) | ||
repo.git.add(A=True) # Add all changes | ||
commit_message = f'Auto-update Tech namespace pages {datetime.now()}' | ||
repo.index.commit(commit_message) | ||
origin = repo.remote(name='origin') | ||
origin.push() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for Git operations
Currently, the Git operations in commit_and_push_changes
lack error handling. If the push fails (e.g., due to non-fast-forward updates or network errors), the script may crash without a clear error message. Consider adding try-except blocks around Git commands to handle exceptions gracefully and provide informative feedback.
def mirror_tech_pages_to_github(): | ||
print('Fetching Tech namespace pages...') | ||
pages = fetch_tech_pages() | ||
update_local_repo() # Clone or pull latest changes | ||
delete_files_not_in_pages(pages) # Delete old files not present in API response | ||
for page in pages: | ||
title = page['title'] | ||
print(f'Processing page: {title}') | ||
content = fetch_page_content(title) | ||
markdown_content = convert_wikitext_to_markdown(content) | ||
write_content_to_file(title, markdown_content) | ||
commit_and_push_changes() | ||
print('Successfully updated GitHub repository.') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement exception handling in the mirroring process
To enhance the robustness of the script, consider adding exception handling within the mirror_tech_pages_to_github
function. If an error occurs while processing a page (e.g., network issue, parsing error), handling the exception can prevent the entire script from terminating prematurely and allow processing of the remaining pages.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary by CodeRabbit
New Features
Enhancements