Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Dictionary Grouping for English #390

Open
ghost opened this issue Jul 10, 2024 · 12 comments
Open

Improve Dictionary Grouping for English #390

ghost opened this issue Jul 10, 2024 · 12 comments

Comments

@ghost
Copy link

ghost commented Jul 10, 2024

First of all. Thank you for your hard work and for adding English support. Jidoujisho has been a great help to me

The problem I'm facing is that some of my dictionary definitions are being detected as separate results, which pretty much messes up my workflow since the frequencies only appear with certain dictionaries. I guess this is due to some dictionaries having the word's spelling also as the reading, whereas some have the reading field empty.

image

Yomitan has a feature for choosing the result grouping mode (Group term-reading pairs, Group related terms, and no grouping) I reckon it would be useful to add something like that for Jidoujisho, or at least to add some kind of customization or enhancement for the way the dictionary entries are displayed.

Edit: this seems to be related to #369

@avc1657
Copy link

avc1657 commented Jul 11, 2024

First of all. Thank you for your hard work and for adding English support. Jidoujisho has been a great help to me

The problem I'm facing is that some of my dictionary definitions are being detected as separate results, which pretty much messes up my workflow since the frequencies only appear with certain dictionaries. I guess this is due to some dictionaries having the word's spelling also as the reading, whereas some have the reading field empty.

IMG-20240710-WA0001

Yomitan has a feature for choosing the result grouping mode (Group term-reading pairs, Group related terms, and no grouping) I reckon it would be useful to add something like that for Jidoujisho, or at least to add some kind of customization or enhancement for the way the dictionary entries are displayed.

Edit: this seems to be related to #369

See if things work better at version 2.8.9. Version 2.9.0 is buggy.

@ghost
Copy link
Author

ghost commented Jul 12, 2024

They do, but 2.8.9 doesn't support Yomitan structured content, which is most of my dictionary collection. I guess I'll just have to wait

@avc1657
Copy link

avc1657 commented Jul 12, 2024

You say most of your English dictionaries use structured content? All my English dicts are in plain text. As for Japanese, I also have a good selection in plain text, including 3 dicts I converted from structured to plain text.

@ghost
Copy link
Author

ghost commented Jul 12, 2024

Yeah, I'm talking about that. How exactly did you convert the dictionaries from structured content to plain text, Do you mind sharing the script or whatever you used?

@avc1657
Copy link

avc1657 commented Jul 12, 2024

To convert them you need a script and the script can vary depending on the dictionary. I just asked chat gpt to write python scripts for me.

I basically prompted something like:

Write for me a python script that runs in ./ that modifies all .json files. The script is for converting a dictionary from structured content to plain text only.

For example [paste a block of the json showing its structured structured]:

But I want it looking like this [convert the block yourself to plain text so chat gpt knows what you're talking about then paste here]:

All blocks of the json files should look like pretty much the same. I want the script to be generic, which means I want all blocks being converted to plain text, bla bla bla.......

That's pretty much it, you just need to tell chat gpt in detail what you need. Or you can just write your own code if you want.

I'll share here my list of dicts just so you can see you can get a very decent coverage just with plain text stuff

SmartSelect_20240712-084024

@ghost
Copy link
Author

ghost commented Jul 12, 2024

You were right. ChatGPT made a script that also deletes the unnecessary entries, and I just converted a couple of dictionaries. Thank you!
The import speed is also better in 2.8.9.

@avc1657
Copy link

avc1657 commented Jul 12, 2024

Which dictionaries did you manage to convert to plain text?

@nacho00112
Copy link

could you please send the script or the complete prompt, I wasn't able to make it

@avc1657
Copy link

avc1657 commented Jul 17, 2024

could you please send the script or the complete prompt, I wasn't able to make it

What dictionaries are you trying to convert?

@nacho00112
Copy link

https://github.com/themoeway/kaikki-to-yomitan/blob/master/downloads.md here the en-en and ja-en, the ja-en loads but not everything, with the en-en the app crashes probably because the dictionary is too big for the RAM

@nacho00112
Copy link

using the 2.9.0 preview 2 solved the problem I hope I don't find the bugs you talked about

@avc1657
Copy link

avc1657 commented Jul 18, 2024

using the 2.9.0 preview 2 solved the problem I hope I don't find the bugs you talked about

2.9.0 preview 2 solves some problems and comes bundled with a few new ones, haha. As the name says, 2.9.0 is still under pre release state, so it is expected to contain bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants