Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citation key generation on large number of entries not showing up in Background Tasks #7267

Closed
1 task
alfureu opened this issue Dec 28, 2020 · 9 comments · Fixed by #7797
Closed
1 task
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty. keygenerator type: enhancement

Comments

@alfureu
Copy link

alfureu commented Dec 28, 2020

JabRef 5.2--2020-12-24--6a2a512
Windows 10 10.0 amd64
Java 14.0.2

This issue is present on the v5.2 stable.

Steps to reproduce the behavior:

  1. Select a large number of entries
  2. Auto-generate citation keys based on presets
  3. No background tasks shown

Would be great if such extensive tasks would show up in the Background task option.

@tobiasdiez tobiasdiez added the good first issue An issue intended for project-newcomers. Varies in difficulty. label Jan 1, 2021
@tp-1000
Copy link
Contributor

tp-1000 commented Mar 22, 2021

Question for whenever someone gets to it. If I am understanding correctly the intention would be just to use the backgroundtask wrapper and have the task/progress show up in the task window. Does the wrapper also protect task from the being interrupted, It doesn't appear to work this way.

The reason I ask is related to the question below.
Should it take a long time to gen-keys for 10 entries of 100000? It's very fast for same selection size(10) with only 10 entries. The reason it's a concern is because currently the gen process can be interrupted without warning. So unless it's remains very quick and completes before a user can take action, it potentially can be interrupted with no notification nor indication where it left off.

Would it be better to block until complete? I don't know typical working numbers but 100000 entries seems like it would be a rarity, so maybe my understanding of the issues at hand are a bit skewed. But, maybe there is a better solution that addresses the potential for interruption.

I would like to add, someone probably wouldn't be requesting task progress-monitoring for something with a very quick resolution.

I would be interested to hear what someone else thought.

@btut
Copy link
Contributor

btut commented Jun 5, 2021

Hi,
The reason it's a concern is because currently the gen process can be interrupted without warning
Why would that be a problem? If the user wants to interrupt, why not interrupt? The question is if there is a need to revert all the entries that were assigned a new key to the old key in case of an interruption.

@tp-1000
Copy link
Contributor

tp-1000 commented Jun 5, 2021

@btut
I don't disagree, I think the process should be stopped if the user intends to stop it, but only if they intend to stop it.

It's been awhile since I've checked the issue out but what I'm remembering was starting the process and continuing to work (which I think was just clicking on entries to examine). This would stop the gen process and without notification. If more clarification would be helpful let me know and I can revisit to refresh my memory.

IMO:
Reverting sounds like an undo-state save. I have no experience implementing but it seems like it should be doable. There could be performance issues, I didn't dig too deeply, but noticed the time it takes for a set of only 10 to complete (quick) was much longer for that same 10 (slow) if they were included in the gen process for a very large set.

@btut
Copy link
Contributor

btut commented Jun 5, 2021

but only if they intend to stop it
so you suggest something like a confirmation dialogue on cancel?

This would stop the gen process and without notification
I see. I'd have to check that. The thing is I don't have a library large enough to test this. There is no way I'm fast enough to hit that cancel button before the operation is done... Do you know of any huge bib files available for download to test this out? Otherwise I may just generate one with random strings.

Reverting sounds like an undo-state save
exactly, but I don't have any experience there either. Do you think this is necessary? Is it a valid usecase to interrupt the key-generation and expect it to revert to the previous state?

a set of only 10 to complete (quick) was much longer for that same 10 (slow) if they were included in the gen process for a very large set.
I guess the non-linearity comes from JabRef making sure the keys are unique.

@tp-1000
Copy link
Contributor

tp-1000 commented Jun 5, 2021

I think at very least a dialog asking if they want to stop the process. I don't know all conditions that could interrupt and it may make more sense to just block with a dialog? Again, as you found out its pretty quick if you aren't doing very many... but then the question became if its so quick the blocking dialog will not be seen, why bother having it display the progress.

There are some script files you can use to build a lib with, look in the scripts/ directory.
There are also some already made, I haven't looked at them, but they are buried in test/resources/testbib/

I think the undo would have been requested if users wanted to take advantage of something like that. To me, the undoing sounds like issue creep and should be its own separate issue.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

a set of only 10 to complete (quick) was much longer for that same 10 (slow) if they were included in the gen process for a very large set.

I guess the non-linearity comes from JabRef making sure the keys are unique.

Yup. The implementation slows down significantly based on the number of entries in the library and the amount of duplicates of the particular key. Is this a problem that needs fixing?

@btut
Copy link
Contributor

btut commented Jun 5, 2021

Yup. The implementation slows down significantly based on the number of entries in the library and the amount of duplicates of the particular key. Is this a problem that needs fixing?
I did not yet experience any performance issues since I never worked with huge libraries. Will try out in the next days.
Didn't yet check the implementation of the key-generator either, but this sounds like the generator checks against all generated keys on each entry, so maybe something like a key -> occurenceCount hashmap would be more efficient.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

@btut it checks all keys on each entry and each collision. I think occurenceCount isn't used anymore and the method can be made slightly quicker (and more readable) by using database.getEntryByKey instead. You can use a hash set instead of database.getEntryByKey but I think you'd get a slight performance decrease because we don't expect many citation key collisions.

Additionally, it seems that we are assuming that any entry can be changed at any time during the key generation, which is why it is iterating through all bibentries. Creating a high-performing alternative while avoiding concurrency issues is probably not worth your time, even if we are not doing XP.

@btut
Copy link
Contributor

btut commented Jun 7, 2021

Additionally, it seems that we are assuming that any entry can be changed at any time during the key generation, which is why it is iterating through all bibentries. Creating a high-performing alternative while avoiding concurrency issues is probably not worth your time, even if we are not doing XP.
Ah yes, that makes lots of sense! Let's leave it as-is then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty. keygenerator type: enhancement
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants