Add pyspelling html filter #795

thomassedlmayer · 2024-03-22T11:58:02Z

I noticed that adding new hyperlinks to the proto files requires to add any character sequences from the hyperlink as exceptions to the spellchecker (see #790). So I found that it is possible to add a html filter that excludes such character sequences.

Here I added a exemplary <a>-tag with a hyperlink that contains strings that should trigger a spellchecker fail. With the new html filter the hyperlink does not trigger the fail anymore but now it fails because it now detects a lot more strings that contain mutated vowels which were not detected before. I guess the html filter converts those special characters so that the spellchecker can also check those words.

@ClemensLinnhoff Could you please check if this type of filter does not screw things up? Was there any reason why this was not added before? If it works, we should probably add the "new" mutated vowel-words as exceptions but remove any exceptions that were previously added because they are contained inside hyperlinks (or also html tags/attribute names like cellspacing, href, colspan etc.). Also, wouldn't it make sense to add German as secondary language so that we don't have to add every German word to the exception list?

ClemensLinnhoff · 2024-03-25T12:31:50Z

Was there any reason why this was not added before?

I was not aware of this filter functionality. So good catch!

Also, wouldn't it make sense to add German as secondary language so that we don't have to add every German word to the exception list?

I agree, that would make sense for the current OSI definitions. But if I recall correctly, I tried this and found out that aspell does not support multiple languages. However, there seem to be some workarounds.
On the other hand I would generally question, why there are links in a non-english language in this international standard. If we would add french, spanish and chinese traffic signs to OSI as well, spell checking in all these languages might get messy.

ClemensLinnhoff · 2024-03-25T12:35:50Z

but now it fails because it now detects a lot more strings that contain mutated vowels which were not detected before.

Seems like it. So either we add all these words with the Umlauts to the spelling_custom_words_en_US.txt or we need to figure out how to check multiple languages. I would opt to add them to the custom words and avoid using non-english words in the future.

Signed-off-by: Thomas Sedlmayer <[email protected]>

thomassedlmayer · 2024-03-25T13:45:50Z

I see. If there is no straightforward way of adding another language to the checker, let's maybe postpone this for now.

I added the necessary exceptions to the whitelist so that the spellchecker does not fail anymore. There may still be some words in the white list that are not required with the html filter in place, like html-tags. We can also maybe get rid of those.

ClemensLinnhoff · 2024-03-25T14:28:59Z

Yes, there are also a lot of half words, as before, umlauts were not recognized, e.g. Parkfl for Parkflächen, Parkst for Parkstände, Rei for Reißverschluss etc. We might be able to also get rid of those.

Signed-off-by: Thomas Sedlmayer <[email protected]>

pmai

CCB 2024-04-04: Merge as-is.

jdsika

Reviewed for v3.7.0

thomassedlmayer added the Quality Quality improvements. label Mar 22, 2024

thomassedlmayer added this to the V3.7.0 milestone Mar 22, 2024

thomassedlmayer requested review from jdsika, PhRosenberger and ClemensLinnhoff March 22, 2024 11:58

thomassedlmayer force-pushed the fix/hyperlink-spellcheck branch from 18ffb9d to b565376 Compare March 25, 2024 13:13

Add pyspelling html filter

560469c

Signed-off-by: Thomas Sedlmayer <[email protected]>

thomassedlmayer force-pushed the fix/hyperlink-spellcheck branch 2 times, most recently from ea49282 to 73e133b Compare March 25, 2024 13:32

Update spellchecker whitelist

213f4de

Signed-off-by: Thomas Sedlmayer <[email protected]>

thomassedlmayer force-pushed the fix/hyperlink-spellcheck branch from 73e133b to 213f4de Compare March 25, 2024 13:37

thomassedlmayer marked this pull request as ready for review March 25, 2024 13:43

Remove partial words from spellchecker whitelist

66abe12

Signed-off-by: Thomas Sedlmayer <[email protected]>

thomassedlmayer added the ReadyForCCBReview Indicates that this MR is ready for a final review and merge by the CCB. label Mar 28, 2024

pmai self-assigned this Apr 4, 2024

pmai approved these changes Apr 4, 2024

View reviewed changes

pmai added ReadyToMerge This PR has been approved to merge and will be merged by a member of the CCB. and removed ReadyForCCBReview Indicates that this MR is ready for a final review and merge by the CCB. labels Apr 4, 2024

pmai merged commit 41cccf3 into OpenSimulationInterface:master Apr 4, 2024
5 checks passed

jdsika reviewed Apr 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pyspelling html filter #795

Add pyspelling html filter #795

thomassedlmayer commented Mar 22, 2024

ClemensLinnhoff commented Mar 25, 2024

ClemensLinnhoff commented Mar 25, 2024

thomassedlmayer commented Mar 25, 2024

ClemensLinnhoff commented Mar 25, 2024

pmai left a comment

jdsika left a comment

Add pyspelling html filter #795

Add pyspelling html filter #795

Conversation

thomassedlmayer commented Mar 22, 2024

ClemensLinnhoff commented Mar 25, 2024

ClemensLinnhoff commented Mar 25, 2024

thomassedlmayer commented Mar 25, 2024

ClemensLinnhoff commented Mar 25, 2024

pmai left a comment

Choose a reason for hiding this comment

jdsika left a comment

Choose a reason for hiding this comment