Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proofing] search and replace matching strings across all pages in a project #476

Merged
merged 29 commits into from
Mar 24, 2023

Conversation

kvchitrapu
Copy link
Collaborator

Feature

Search and replace a string works. May need stylesheet changes to pretty it up. Completes #455.

Closed #407. Created a new clean branch and as the base branch for #470 pulled an existing commit.

Tests

Screenshot 2023-03-16 at 8 28 50 PM

Screenshot 2023-03-16 at 8 29 37 PM
Screenshot 2023-03-16 at 8 30 06 PM

@kvchitrapu kvchitrapu requested a review from akprasad March 18, 2023 00:32
@kvchitrapu kvchitrapu closed this Mar 18, 2023
@kvchitrapu kvchitrapu reopened this Mar 18, 2023
@kvchitrapu kvchitrapu requested review from shreevatsa and suhasm March 18, 2023 20:46
@kvchitrapu kvchitrapu mentioned this pull request Mar 18, 2023
5 tasks
@kvchitrapu
Copy link
Collaborator Author

@shreevatsa or @suhasm , can you try search+replace interface and give me feedback? I think to try locally this PR you can git fetch origin pull/476/head:issue-470-search-replace. @akprasad feel free to try once you feel better.

I deployed it on my staging env: http://147.182.220.33:5001/proofing/satya/replace

kvchitrapu and others added 5 commits March 20, 2023 23:45
* search and replace matching strings across all pages in a project

* added regex query support

* fixed lint issue
Copy link
Contributor

@akprasad akprasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a first pass it looks good to me. I'd like to experiment with it in staging -- can you help me use your staging deploy? Otherwise I'll try it on dev this weekend.

"line_num": line_num,
}
)
except TimeoutError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the use case for regex search? If it has a big risk surface we can disable it for now.

Copy link
Collaborator Author

@kvchitrapu kvchitrapu Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suhasm asked for it. The use case is real where Vision api might have incorrectly extracted a pattern of words and the proofer wants to correct them across the book.

It increases the risk surface, but I also acknowledge the fact that these proofers are registered users who are committed to proofreading and helping out. Also, the implementation limits the surface to one line at a time.

replace_form_key = f"match{page_.slug}-{line_num}-replace"

if selected_keys.get(form_key) == "selected":
LOG.debug(f"{__name__}: {form_key}: {selected_keys.get(form_key)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than explicit __name__ in these logs, we should try to either configure the logger or use a helper function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I punted that work and just forgot about it. I find logging extremely helpful but I didn't invest time to fix it systematically.

@@ -105,6 +105,71 @@ def test_search__bad_project(rama_client):
assert resp.status_code == 404


def test_replace(moderator_client):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the tests!

lambda m: Markup(f"<mark>{escape(m.group(0))}</mark>"), line
)
marked_replace = query_pattern.sub(
Markup(f"<mark>{escape(replace)}</mark>"), line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than build HTML directly, let's move all of this logic into the template. Then marked_replace can be a simple list of strings.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than build HTML directly, let's move all of this logic into the template. Then marked_replace can be a simple list of strings.

HTML and js take time for me to get right. I'll try to find a way to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, no problem -- I can clean it up later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that it's desirable to move HTML markup into Jinja, but it seems to complicate the solution somewhat.

When using query_patten.sub(), the matches are marked in-place with as they are found. If an unmarked line is passed to Jinja, we need to either send a list of matches in the line or re-run the regex in Jinja code. Additionally, to mark the replaced substrings accurately, they should be passed as a list. Therefore, marking these substrings in _replace_text() seems to strike a good balance between clean language separation and the simplicity of the solution.

for key, value in request.form.items()
if key.startswith("match") and not key.endswith("replace")
}
render = _select_changes(project_, selected_keys, query=query, replace=replace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very thoughtful! I was imaging a simple batch search/replace, but being able to review all changes makes this really special.

Copy link
Collaborator Author

@kvchitrapu kvchitrapu Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still more work to be done. Ideally, pagination should be added to the review page. We also need to catch runaway searches such as 'ॐ' or 'the' to ensure memory consumption is kept under control. However, I am relying on users to be responsible in their searches.

@kvchitrapu
Copy link
Collaborator Author

http://147.182.220.33:5001/proofing/satya/replace

Please above link with "http://" instead of "https://" as some browsers suggest it to be.

@akprasad akprasad merged commit 515b046 into ambuda-org:main Mar 24, 2023
@kvchitrapu kvchitrapu deleted the issue-470-search-replace branch March 31, 2023 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants