-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pyspelling html filter #795
Add pyspelling html filter #795
Conversation
I was not aware of this filter functionality. So good catch!
I agree, that would make sense for the current OSI definitions. But if I recall correctly, I tried this and found out that aspell does not support multiple languages. However, there seem to be some workarounds. |
Seems like it. So either we add all these words with the Umlauts to the spelling_custom_words_en_US.txt or we need to figure out how to check multiple languages. I would opt to add them to the custom words and avoid using non-english words in the future. |
18ffb9d
to
b565376
Compare
Signed-off-by: Thomas Sedlmayer <[email protected]>
ea49282
to
73e133b
Compare
Signed-off-by: Thomas Sedlmayer <[email protected]>
73e133b
to
213f4de
Compare
I see. If there is no straightforward way of adding another language to the checker, let's maybe postpone this for now. I added the necessary exceptions to the whitelist so that the spellchecker does not fail anymore. There may still be some words in the white list that are not required with the html filter in place, like html-tags. We can also maybe get rid of those. |
Yes, there are also a lot of half words, as before, umlauts were not recognized, e.g. Parkfl for Parkflächen, Parkst for Parkstände, Rei for Reißverschluss etc. We might be able to also get rid of those. |
Signed-off-by: Thomas Sedlmayer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CCB 2024-04-04: Merge as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed for v3.7.0
I noticed that adding new hyperlinks to the proto files requires to add any character sequences from the hyperlink as exceptions to the spellchecker (see #790). So I found that it is possible to add a html filter that excludes such character sequences.
Here I added a exemplary
<a>
-tag with a hyperlink that contains strings that should trigger a spellchecker fail. With the new html filter the hyperlink does not trigger the fail anymore but now it fails because it now detects a lot more strings that contain mutated vowels which were not detected before. I guess the html filter converts those special characters so that the spellchecker can also check those words.@ClemensLinnhoff Could you please check if this type of filter does not screw things up? Was there any reason why this was not added before? If it works, we should probably add the "new" mutated vowel-words as exceptions but remove any exceptions that were previously added because they are contained inside hyperlinks (or also html tags/attribute names like
cellspacing
,href
,colspan
etc.). Also, wouldn't it make sense to add German as secondary language so that we don't have to add every German word to the exception list?