Only show recents from the current set of characters #173

will-lynas · 2024-02-01T17:57:20Z

Before this change, when using rofimoji with different sets of characters, the recent characters may include characters which aren't available in the current set.

After this change:

fdw · 2024-02-02T15:04:24Z

Hey, that's a very valid point, thank you!

I haven't looked too deeply into the PR yet, but do I read it right that there is no limit anymore in the recent file? Instead (if I didn't miss something, which is very possible), we'd read all lines from there, compare each with the current character set and then take the first n from there? That does sound a bit inefficient, doesn't it? And I'd like to avoid a file growing limitless, if possible.

Unfortunately, I don't have a good solution now, but I'm wondering if it would help if we split the "recent" file up per character set. What do you think?

will-lynas · 2024-02-02T17:26:43Z

Yes, that's correct, and yes it is quite inefficient. We could split up the "recent" file into separate files based on the hashes of the character sets. Is that an acceptable overhead for each run?

Also always store 10 recents, regardless of the limit set. This means that if number of recents is increased, the whole list is already populated.

will-lynas · 2024-02-04T12:52:32Z

If this change is included, is it worth adding something to convert the old recents file to the new format?

fdw · 2024-02-10T09:22:46Z

The way I read it (and please correct me if I'm wrong), it's now hashing the whole character set and using that as a filename. I think the general idea is good, but the hashing itself probably takes some time as well, and it breaks when the data changes (which is not that seldom).

I'm wondering if we could write one recents file per data file, so for example recents/emojis_smileys_emotion. That way, it's easy to pick the recent characters for the current run. The problem is finding out which file the current selection should go to...

If we do something like this, and if it's possible, it would be quite nice to migrate the old file. But it's not necessary, I'd say.

will-lynas · 2024-02-11T14:19:44Z

I've run some tests, and with the biggest character set included in rofimoji (~42000 characters) hashing only takes ~2ms, so I don't think that on its own is an issue.

I did think about just using the filenames, which would probably be fine for the internal data files (I assume their contents don't change too much?), but someone using their own file may change it significantly. Just using the filenames means that we have to check the characters in the recents file each time, to check if they are still part of the character set, which adds some amount of complexity.

There is also the question that you bring up of whether a character should be considered recent with respect to the whole character set or wrt the file it came from. Keeping track of which file each character comes from is tricky. And I presume there is nothing stopping a character being defined in multiple files, if someone is inclined to do so.

My preference would be to keep the simplicity of hashing whole character sets, given that performance doesn't seem to be an issue. And accept the tradeoff that the recents list will occasionally go blank, and need repopulating.

fdw · 2024-02-17T18:25:22Z

Thanks for the test regarding performance 🙂 That does help.

The other problem I have with hashing the characters is that at least the distributed data files are updated more or less regularly with new characters. Meaning the old hash doesn't apply anymore, the user starts with a blank one and we have a file lying around that won't be used again. I find that at least as bad as showing the wrong recent characters.

I'm not sure there is a good solution... maybe an acceptable alternative would be to let the user specify the recent file and let them take care of it?

will-lynas · 2024-02-25T22:05:35Z

Ok, in that case I think it is best to just leave it as it is then.

fdw · 2024-03-16T08:29:50Z

Hey, I'm sorry I vanished there; I never meant to ghost you! I just had a lot of other things to do.

But I'm still interested in the PR. Just had no time to think about what to do next.

@will-lynas

So far, the recently chosen characters were shared among all instances of `rofimoji`, no matter which files were actually shown. Now, we show and save them per fileset, which is an improvement. However, if one file is used in multiple filesets, their usage is not shared, but separate. A solution for this seems to complicated for what we try to achieve here. The old `recents` file will be migrated on startup, if necessary. Many thanks to @will-lynas for reporting this and working with me to find a good solution. Issue: #173

fdw · 2024-03-24T15:56:52Z

Your closing this PR finally made me work on this problem 🙈

Storing one "recents" file per character set is not feasible, because we can't merge them - was the character from this set more recently used than the first one from that set?
Hashing the queried characters and using that as an identifier, while it's only small overhead (as you've shown), is impractical because the contents of one character set can and do change every once in a while, and then you'd have no recent characters, but a file that isn't used anymore.
Instead, I've now opted to have one file for each "fileset", i.e. a list of all the passed files. This still has the problem that the information isn't shared between different filesets with some shared characters.
However, solving that would require us to have a complete list of the recently used characters of all sets and comparing each one with the currently queried set, and that is slow. I've wondered if a real SQLite DB would help with this (and it probably would solve this part of the problem), but it's another dependency for a quite simple program, and it smells like over-engineering to me.

Hope you like it 🙂 If you'd like me to list you as co-author on this (because you've done some work here), or if you have any other ideas, feel free to write something.

will-lynas added 2 commits February 4, 2024 12:29

Store recents in separate files based on hash of the character set

1564315

Also always store 10 recents, regardless of the limit set. This means that if number of recents is increased, the whole list is already populated.

Only compute hash once

20f5f0a

will-lynas closed this Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only show recents from the current set of characters #173

Only show recents from the current set of characters #173

will-lynas commented Feb 1, 2024 •

edited

Loading

fdw commented Feb 2, 2024

will-lynas commented Feb 2, 2024

will-lynas commented Feb 4, 2024

fdw commented Feb 10, 2024

will-lynas commented Feb 11, 2024

fdw commented Feb 17, 2024

will-lynas commented Feb 25, 2024

fdw commented Mar 16, 2024

fdw commented Mar 24, 2024

Only show recents from the current set of characters #173

Only show recents from the current set of characters #173

Conversation

will-lynas commented Feb 1, 2024 • edited Loading

fdw commented Feb 2, 2024

will-lynas commented Feb 2, 2024

will-lynas commented Feb 4, 2024

fdw commented Feb 10, 2024

will-lynas commented Feb 11, 2024

fdw commented Feb 17, 2024

will-lynas commented Feb 25, 2024

fdw commented Mar 16, 2024

fdw commented Mar 24, 2024

will-lynas commented Feb 1, 2024 •

edited

Loading