Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple dictionaries per file #1069

Closed
forthrin opened this issue Jan 20, 2016 · 29 comments
Closed

Multiple dictionaries per file #1069

forthrin opened this issue Jan 20, 2016 · 29 comments

Comments

@forthrin
Copy link

I use Sublime Text to write fiction in which the same text file may contain several languages, especially in lines spoken by characters of different nationalities, for example Norwegian, English and Japanese.

When using spell checking, it seems I must settle on a single language, which means text in other languages will be marked as "incorrectly spelled" and marked in red. Annoying and confusing.

What would be the best way to overcome this? Can support be added for multiple dictionaries per file, for example?

If it makes a difference, I do most of this writing in the Fountainhead package, if that makes it easier to come up with a solution (such as handling spoken lines in a particular way.)

@titoBouzout
Copy link
Collaborator

The issue with this is that there is no API to spell-check, but basically everything else (selecting lines, highlighting, changing words) is possible.

@FichteFoll FichteFoll changed the title Multiple dictionaries per text file Multiple dictionaries per file Jan 21, 2016
@oliva
Copy link

oliva commented Aug 10, 2016

The same happens for me when programming or writing translation files where the variables are in English and the strings are in a different language.

@evandrocoan
Copy link

As I write both in English and Portuguese, I combined the English Dictionary with the Portuguese dictionary. So, now I got spell checking on both languages. You may find this dictionary here:

  1. https://github.com/evandrocoan/SublimeTextStudio/tree/develop/MultiLingual%20Dictionary

@Dxhs
Copy link

Dxhs commented Nov 2, 2016

@evandrocoan I just did this with my Norwegian and English. 322044 lines + 470122. Took me 5 minutes to c+p it with kate on linux. Lol

@titoBouzout
Copy link
Collaborator

Can you please briefly but precisely describe how do you that? Im interested Thanks!

@evandrocoan
Copy link

evandrocoan commented Nov 2, 2016

Steps to merge 2 dictionaries files

  1. Download new dictionaries from: https://github.com/titoBouzout/Dictionaries
  2. Duplicate the file EN_US.txt as EN_US_MY_LANG.txt
  3. Duplicate the file EN_US.aff as EN_US_MY_LANG.aff
  4. Duplicate the file EN_US.dic as EN_US_MY_LANG.dic
  5. Open your MY_LANG.txt and append its contents on EN_US_MY_LANG.txt.
  6. Open your MY_LANG.aff and merge its contents on EN_US_MY_LANG.aff using your intelligence.
  7. Open your MY_LANG.dic and append its contents on EN_US_MY_LANG.dic and update the EN_US_MY_LANG.dic first file line with the correct number of words on this new file.
  8. Set EN_US_MY_LANG as your default spelling check language.

Now you got spelling on 2 languages. But there are some downsides:

  1. When merging the files .aff you need to take care on how you do it otherwise it may crash Sublime Text.
  2. The misspelling suggestions will not be accurate most times, as you now got distinct languages bond by the same misspelling/spelling prediction rules.

@Dxhs
Copy link

Dxhs commented Nov 2, 2016

@titoBouzout In linux, I used the cat command with shell redirection (>) into my output file:
english.dic norwegian.dic > output.dic.
If use my standard text editor, it crashes. Lshell did it in milliseconds. I suggest using Lshell or Cmd.

@titoBouzout
Copy link
Collaborator

Thanks for the explanation! Ill try :)

@gustavobittencourt
Copy link

Thank you, @evandrocoan!

I'll try to install your EN_PT dictionary here!

@Kristinita
Copy link

I can not to merge Russian and English dictionaries, I get bad results. See part of answer of hunspell contributor:

Bilingual spellchecker is not supported, at least not in reliable way. Merging dictionaries should be out of question.

Thanks.

@evandrocoan
Copy link

evandrocoan commented Nov 15, 2016

@dimztimz is correct on this:

Instead, at API level you can instantiate multiple objects of the spellchecker with different languages. Then you can check the word in each object. This is the most reliable way for now.

Therefore as it is to be performed by the Sublime Text spell checking core. So we need to wait for them to implement this feature for the best functionality.

Now I got good results with some disadvantages merging the EN _ PT dictionaries. However these two languages are pretty similar. For English and Russian, should not be easy to merge them, if it is possible.

@Kristinita
Copy link

More programs use Hunspell as Sublime Text. Can try to find extension for another program. I try use in Sublime Text Firefox Russian-English Bilingual addon, and it successfully worked for me.

Kristina

But for single language spellchecking package LanguageTool — the best solution with many nice features.

Thanks.

@ghost
Copy link

ghost commented Jul 12, 2017

And I thought I had it with

{
    "dictionary":
    [
        "Packages/Language - English/en_US.dic",
        "Packages/Language - Other/Portuguese (European).dic"
    ]
}

Alas, no.

@BenjaminSchaaf
Copy link
Member

Fixed in build 4123. "dictionary" can now be provided a list.

@eugenesvk
Copy link

Fixed in build 4123. "dictionary" can now be provided a list.

This doesn't work very reliably, please see a quick check of different dictionary combos taken from here below and note how the en_US.dic is a spoiler , though only for the Russian one :)
(changing the order of dictionaries doesn't seem to matter)

This is my complete settings file in a new portable Sublime on Windows, scroll horizontally to see the results for different dictionary combos

{
"ignored_packages":["Vintage",],
"spell_check": true,
"dictionary": [
"Packages/Language - English/en_US.dic",       // 1
"Packages/Language - English/en_GB.dic",       // 2
"Packages/User/Dictionaries/German_de_DE.dic", // 3
"Packages/User/Dictionaries/Russian.dic",      // 4
]
}
// Ln Text                                	1US	2GB	3DE	4Ru	1+2	1+3	1+4	2+3	2+4	3+4	1+3+4	2+3+4	1+2+3+4
// US "Is htis in colors?  That's insane!"	+  	+  	*  	*  	+  	+  	+  	+  	+  	*  	+    	+    	+
// GB "Is htis in colours? That's insane!"	+  	+  	*  	*  	+  	+  	+  	+  	+  	*  	+    	+    	+
// De "Rechtschreibe/stylistsische Fehler"	*  	*  	+  	*  	*  	+  	*  	+  	*  	+  	+    	+    	+
// Ru "Превед, медвед, как дела"          	-  	*  	*  	+  	-  	-  	-  	*  	+  	+  	-    	+    	-

// + highlights spellchecking errors in the Row's language (from the perspective of the Column's dictionary/ies)
// * same as +, but for a mismatching language (~the whole line is highlighted)
// - nothing is highlighted, even spellchecking errors

@BenjaminSchaaf
Copy link
Member

It's working fine here with the same dictionaries:

Screenshot from 2021-12-10 16-06-14

@eugenesvk
Copy link

And by "working fine" you mean that it's not highlighting any spelling mistakes in the Russian line? All the example lines contain spelling errors, so no line should ever be free from red no matter the dictionary/ies

@eugenesvk
Copy link

eugenesvk commented Dec 10, 2021

I think it's due to the first wrong line in the dictionary affix file SET ISO8859-1, it should be UTF8.
Not sure if the dictionary has to be regenerated, it doesn't seem to be saved in UTF8, so likely yes, though a simple text replace seems to be fine and seems to fix the surface issue of no highlights in the Ru line (otherwise haven't done any tests re. how well any of these combos work)

Out of curiosity, are you combining the language files behind the scenes (having to deal with different affix schemes) or are you using a simpler API and send each text to each dic and then combine the results it somehow?

@BenjaminSchaaf
Copy link
Member

Ah yes, the en_US dictionary seems to be wrongly configured. It's unrelated to this issue, but I'll put in a fix.

Out of curiosity, are you combining the language files behind the scenes (having to deal with different affix schemes) or are you using a simpler API and send each text to each dic and then combine the results it somehow?

There's no way to combine the languages (easily), so we just check if a sub-word is spelt correctly according to any of the listed dictionaries.

@BenjaminSchaaf
Copy link
Member

Upon further inspection we just seem to be handling the encoding wrong.

@MarllonMenezes
Copy link

Como escrevo tanto em inglês quanto em português, combinei o Dicionário de inglês com o dicionário de português. Então, agora eu tenho verificação ortográfica em ambos os idiomas. Você pode encontrar este dicionário aqui:

  1. https://github.com/evandrocoan/SublimeTextStudio/tree/develop/MultiLingual%20Dictionary

você ainda tem esse Dicionário ?

@evandrocoan
Copy link

@stdedos
Copy link

stdedos commented May 25, 2022

Is this consider to be solved? Because I still have problems with it.

I have also tried with a UTF8 dic, no luck 😕

@BenjaminSchaaf
Copy link
Member

@stdedos the encoding problem was fixed in build 4125, so yes this should all be working.

@stdedos
Copy link

stdedos commented May 26, 2022

I am using https://github.com/titoBouzout/Dictionaries/blob/master/Greek.dic with

	"dictionary": [
		"Packages/Language - English/en_US.dic", // 1
		"Packages/Language - English/en_GB.dic", // 2
		"Packages/Greek.dic",                    // 3
		"Packages/Greek_UTF8.dic",               // 3
		"Packages/User/Greek.dic",               // 3
	],

and text

What is Lorem Ipsum?
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

Γιατί το χρησιμοποιούμε;
Είναι πλέον κοινά παραδεκτό ότι ένας αναγνώστης αποσπάται από το περιεχόμενο που διαβάζει, όταν εξετάζει τη διαμόρφωση μίας σελίδας. Η ουσία της χρήσης του Lorem Ipsum είναι ότι έχει λίγο-πολύ μία ομαλή κατανομή γραμμάτων, αντίθετα με το να βάλει κανείς κείμενο όπως 'Εδώ θα μπει κείμενο, εδώ θα μπει κείμενο', κάνοντάς το να φαίνεται σαν κανονικό κείμενο. Πολλά λογισμικά πακέτα ηλεκτρονικής σελιδοποίησης και επεξεργαστές ιστότοπων πλέον χρησιμοποιούν το Lorem Ipsum σαν προκαθορισμένο δείγμα κειμένου, και η αναζήτησ για τις λέξεις 'lorem ipsum' στο διαδίκτυο θα αποκαλύψει πολλά web site που βρίσκονται στο στάδιο της δημιουργίας. Διάφορες εκδοχές έχουν προκύψει με το πέρασμα των χρόνων, άλλες φορές κατά λάθος, άλλες φορές σκόπιμα (με σκοπό το χιούμορ και άλλα συναφή).

Where does it come from?
Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.

The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham.

Που μπορώ να βρω μερικές;
Υπάρχουν πολλές εκδοχές των αποσπασμάτων του Lorem Ipsum διαθέσιμες, αλλά η πλειοψηφία τους έχει δεχθεί κάποιας μορφής αλλοιώσεις, με ενσωματωμένους αστεεισμούς, ή τυχαίες λέξεις που δεν γίνονται καν πιστευτές. Εάν πρόκειται να χρησιμοποιήσετε ένα κομμάτι του Lorem Ipsum, πρέπει να είστε βέβαιοι πως δεν βρίσκεται κάτι προσβλητικό κρυμμένο μέσα στο κείμενο. Όλες οι γεννήτριες Lorem Ipsum στο διαδίκτυο τείνουν να επαναλαμβάνουν προκαθορισμένα κομμάτια του Lorem Ipsum κατά απαίτηση, καθιστώνας την παρούσα γεννήτρια την πρώτη πραγματική γεννήτρια στο διαδίκτυο. Χρησιμοποιεί ένα λεξικό με πάνω από 200 λατινικές λέξεις, συνδυασμένες με ένα εύχρηστο μοντέλο σύνταξης προτάσεων, ώστε να παράγει Lorem Ipsum που δείχνει λογικό. Από εκεί και πέρα, το Lorem Ipsum παραμένει πάντα ανοιχτό σε επαναλήψεις, ενσωμάτωση χιούμορ, μη κατανοητές λέξεις κλπ.

half lights up like a Christmas tree.

@BenjaminSchaaf
Copy link
Member

Screenshot from 2022-05-26 16-05-46

It's working fine here with the linked dictionary. I suggest double checking you've correctly installed that dictionary - both the aff and dic files are required and must not have their encodings modified.

@ghost
Copy link

ghost commented Sep 15, 2022

Hi @BenjaminSchaaf

Can you tell me what is wrong with the Ukrainian dictionary?

When I add to the dictionary list, English one stops working.

{
	"dictionary": [
		"Packages/Language - English/en_US.dic",
		"Packages/User/Dictionaries/uk_UA.dic",
	],
}

Thanks!

Sublime Text v4134
Windows 10

uk_UA.zip

@BenjaminSchaaf
Copy link
Member

@ihor-oleks I suggest making a separate issue, thanks.

@ghost
Copy link

ghost commented Sep 15, 2022

Done, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests