Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only show recents from the current set of characters #173

Closed
wants to merge 2 commits into from
Closed

Only show recents from the current set of characters #173

wants to merge 2 commits into from

Conversation

will-lynas
Copy link

@will-lynas will-lynas commented Feb 1, 2024

Before this change, when using rofimoji with different sets of characters, the recent characters may include characters which aren't available in the current set.

image
image

After this change:

image
image

@fdw
Copy link
Owner

fdw commented Feb 2, 2024

Hey, that's a very valid point, thank you!

I haven't looked too deeply into the PR yet, but do I read it right that there is no limit anymore in the recent file? Instead (if I didn't miss something, which is very possible), we'd read all lines from there, compare each with the current character set and then take the first n from there? That does sound a bit inefficient, doesn't it? And I'd like to avoid a file growing limitless, if possible.

Unfortunately, I don't have a good solution now, but I'm wondering if it would help if we split the "recent" file up per character set. What do you think?

@will-lynas
Copy link
Author

Yes, that's correct, and yes it is quite inefficient. We could split up the "recent" file into separate files based on the hashes of the character sets. Is that an acceptable overhead for each run?

Also always store 10 recents, regardless of the limit set.
This means that if number of recents is increased, the whole list is
already populated.
@will-lynas
Copy link
Author

If this change is included, is it worth adding something to convert the old recents file to the new format?

@fdw
Copy link
Owner

fdw commented Feb 10, 2024

The way I read it (and please correct me if I'm wrong), it's now hashing the whole character set and using that as a filename. I think the general idea is good, but the hashing itself probably takes some time as well, and it breaks when the data changes (which is not that seldom).

I'm wondering if we could write one recents file per data file, so for example recents/emojis_smileys_emotion. That way, it's easy to pick the recent characters for the current run. The problem is finding out which file the current selection should go to...

If we do something like this, and if it's possible, it would be quite nice to migrate the old file. But it's not necessary, I'd say.

@will-lynas
Copy link
Author

I've run some tests, and with the biggest character set included in rofimoji (~42000 characters) hashing only takes ~2ms, so I don't think that on its own is an issue.

I did think about just using the filenames, which would probably be fine for the internal data files (I assume their contents don't change too much?), but someone using their own file may change it significantly. Just using the filenames means that we have to check the characters in the recents file each time, to check if they are still part of the character set, which adds some amount of complexity.

There is also the question that you bring up of whether a character should be considered recent with respect to the whole character set or wrt the file it came from. Keeping track of which file each character comes from is tricky. And I presume there is nothing stopping a character being defined in multiple files, if someone is inclined to do so.

My preference would be to keep the simplicity of hashing whole character sets, given that performance doesn't seem to be an issue. And accept the tradeoff that the recents list will occasionally go blank, and need repopulating.

@fdw
Copy link
Owner

fdw commented Feb 17, 2024

Thanks for the test regarding performance 🙂 That does help.

The other problem I have with hashing the characters is that at least the distributed data files are updated more or less regularly with new characters. Meaning the old hash doesn't apply anymore, the user starts with a blank one and we have a file lying around that won't be used again. I find that at least as bad as showing the wrong recent characters.

I'm not sure there is a good solution... maybe an acceptable alternative would be to let the user specify the recent file and let them take care of it?

@will-lynas
Copy link
Author

Ok, in that case I think it is best to just leave it as it is then.

@will-lynas will-lynas closed this Mar 16, 2024
@fdw
Copy link
Owner

fdw commented Mar 16, 2024

Hey, I'm sorry I vanished there; I never meant to ghost you! I just had a lot of other things to do.

But I'm still interested in the PR. Just had no time to think about what to do next.

fdw added a commit that referenced this pull request Mar 24, 2024
So far, the recently chosen characters were shared among all instances
of `rofimoji`, no matter which files were actually shown.

Now, we show and save them per fileset, which is an improvement.
However, if one file is used in multiple filesets, their usage is not
shared, but separate. A solution for this seems to complicated for what
we try to achieve here.

The old `recents` file will be migrated on startup, if necessary.

Many thanks to @will-lynas for reporting this and working with me to
find a good solution.

Issue: #173
@fdw
Copy link
Owner

fdw commented Mar 24, 2024

Your closing this PR finally made me work on this problem 🙈

Storing one "recents" file per character set is not feasible, because we can't merge them - was the character from this set more recently used than the first one from that set?
Hashing the queried characters and using that as an identifier, while it's only small overhead (as you've shown), is impractical because the contents of one character set can and do change every once in a while, and then you'd have no recent characters, but a file that isn't used anymore.
Instead, I've now opted to have one file for each "fileset", i.e. a list of all the passed files. This still has the problem that the information isn't shared between different filesets with some shared characters.
However, solving that would require us to have a complete list of the recently used characters of all sets and comparing each one with the currently queried set, and that is slow. I've wondered if a real SQLite DB would help with this (and it probably would solve this part of the problem), but it's another dependency for a quite simple program, and it smells like over-engineering to me.

Hope you like it 🙂 If you'd like me to list you as co-author on this (because you've done some work here), or if you have any other ideas, feel free to write something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants