Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pyspelling html filter #795

Merged

Conversation

thomassedlmayer
Copy link
Contributor

I noticed that adding new hyperlinks to the proto files requires to add any character sequences from the hyperlink as exceptions to the spellchecker (see #790). So I found that it is possible to add a html filter that excludes such character sequences.

Here I added a exemplary <a>-tag with a hyperlink that contains strings that should trigger a spellchecker fail. With the new html filter the hyperlink does not trigger the fail anymore but now it fails because it now detects a lot more strings that contain mutated vowels which were not detected before. I guess the html filter converts those special characters so that the spellchecker can also check those words.

@ClemensLinnhoff Could you please check if this type of filter does not screw things up? Was there any reason why this was not added before? If it works, we should probably add the "new" mutated vowel-words as exceptions but remove any exceptions that were previously added because they are contained inside hyperlinks (or also html tags/attribute names like cellspacing, href, colspan etc.). Also, wouldn't it make sense to add German as secondary language so that we don't have to add every German word to the exception list?

@thomassedlmayer thomassedlmayer added the Quality Quality improvements. label Mar 22, 2024
@thomassedlmayer thomassedlmayer added this to the V3.7.0 milestone Mar 22, 2024
@ClemensLinnhoff
Copy link
Contributor

Was there any reason why this was not added before?

I was not aware of this filter functionality. So good catch!

Also, wouldn't it make sense to add German as secondary language so that we don't have to add every German word to the exception list?

I agree, that would make sense for the current OSI definitions. But if I recall correctly, I tried this and found out that aspell does not support multiple languages. However, there seem to be some workarounds.
On the other hand I would generally question, why there are links in a non-english language in this international standard. If we would add french, spanish and chinese traffic signs to OSI as well, spell checking in all these languages might get messy.

@ClemensLinnhoff
Copy link
Contributor

but now it fails because it now detects a lot more strings that contain mutated vowels which were not detected before.

Seems like it. So either we add all these words with the Umlauts to the spelling_custom_words_en_US.txt or we need to figure out how to check multiple languages. I would opt to add them to the custom words and avoid using non-english words in the future.

@thomassedlmayer thomassedlmayer force-pushed the fix/hyperlink-spellcheck branch from 18ffb9d to b565376 Compare March 25, 2024 13:13
Signed-off-by: Thomas Sedlmayer <[email protected]>
@thomassedlmayer thomassedlmayer force-pushed the fix/hyperlink-spellcheck branch 2 times, most recently from ea49282 to 73e133b Compare March 25, 2024 13:32
Signed-off-by: Thomas Sedlmayer <[email protected]>
@thomassedlmayer thomassedlmayer force-pushed the fix/hyperlink-spellcheck branch from 73e133b to 213f4de Compare March 25, 2024 13:37
@thomassedlmayer thomassedlmayer marked this pull request as ready for review March 25, 2024 13:43
@thomassedlmayer
Copy link
Contributor Author

I see. If there is no straightforward way of adding another language to the checker, let's maybe postpone this for now.

I added the necessary exceptions to the whitelist so that the spellchecker does not fail anymore. There may still be some words in the white list that are not required with the html filter in place, like html-tags. We can also maybe get rid of those.

@ClemensLinnhoff
Copy link
Contributor

Yes, there are also a lot of half words, as before, umlauts were not recognized, e.g. Parkfl for Parkflächen, Parkst for Parkstände, Rei for Reißverschluss etc. We might be able to also get rid of those.

@thomassedlmayer thomassedlmayer added the ReadyForCCBReview Indicates that this MR is ready for a final review and merge by the CCB. label Mar 28, 2024
@pmai pmai self-assigned this Apr 4, 2024
Copy link
Contributor

@pmai pmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCB 2024-04-04: Merge as-is.

@pmai pmai added ReadyToMerge This PR has been approved to merge and will be merged by a member of the CCB. and removed ReadyForCCBReview Indicates that this MR is ready for a final review and merge by the CCB. labels Apr 4, 2024
@pmai pmai merged commit 41cccf3 into OpenSimulationInterface:master Apr 4, 2024
5 checks passed
Copy link
Contributor

@jdsika jdsika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed for v3.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Quality Quality improvements. ReadyToMerge This PR has been approved to merge and will be merged by a member of the CCB.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants