feat: add swiss german as a language #164
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello
I added Swiss German as another language.
In order to do that, I had to move the training files into a subfolder named after the ISO 639_3 code as the _1 is not unique between German and Swiss German. For that reason I also had to change the name of the test files.
If this change is not OK, I am open for suggestions on how to "fix" this problem :)
The accurracy is not that great, but this was kinda expected as Swiss German is pretty similar to German. Maybe with better training data this could be fixed. However due to the "grouping" by the ISO 639_1 code, it is probably possible to have a prediction for Swiss German and German simultanously and thus "improving" the accurracy, as far as I understand.
I got all data from here. I used the 2021 Wikipedia 100k for the training and the 2017 Web 100k for the test.
Thanks for your feedback :)