Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposed fix for #707 #708

Merged
merged 2 commits into from
Apr 16, 2021
Merged

proposed fix for #707 #708

merged 2 commits into from
Apr 16, 2021

Conversation

gavinking
Copy link
Member

Make sure that queued requests use the right session and entity args.

@gavinking
Copy link
Member Author

OK, @Sanne, @DavideD, @stuartwdouglas, WDYT of this?

Obviously I have not tested it.

@gavinking gavinking linked an issue Apr 15, 2021 that may be closed by this pull request
@Sanne
Copy link
Member

Sanne commented Apr 15, 2021

I think we need good tests for this area, not least to check how this behaves when the consumer rate is higher than our ability to produce new ids.

make sure that queued requests use the right session and entity args
@gavinking
Copy link
Member Author

gavinking commented Apr 15, 2021

I think we need good tests for this area

Well, yeah, clearly. The problem is that we have precisely no tests for concurrent operation in the HR project, and I'm not clear even on the right way to write them.

@gavinking
Copy link
Member Author

(I force-pushed to remove the useless Queued class and just use a Runnable instead.)

@stuartwdouglas
Copy link
Contributor

Looks good, but if we have n consumers waiting maybe we should make sure we request at least n IDs? At the moment by default it is one at a time, so this is basically a global lock in terms of persist performance.

@gavinking
Copy link
Member Author

Looks good, but if we have n consumers waiting maybe we should make sure we request at least n IDs? At the moment by default it is one at a time, so this is basically a global lock in terms of persist performance.

Well the assumption I made was that you always choose a big enough block size that you "rarely" need to request a new block. If that's true, then I think having more requests waiting than the size of the block is an extremely low-probability event.

On the other hand, what I realized earlier was that, ooops, what about people who don't want blocks, and set the block size to 1? We should actually be bypassing all this logic entirely in that case, and we don't. Not quite sure how I overlooked that.

@gavinking
Copy link
Member Author

if we have n consumers waiting maybe we should make sure we request at least n IDs?

Also, at the time we request an hi value, we have zero consumers waiting.

@gavinking
Copy link
Member Author

gavinking commented Apr 15, 2021

Not quite sure how I overlooked that.

Well, hrm, actually I supposed it's not entirely clear what the best behavior here is.

For sequences, yes, I think the right thing to do is not wait for concurrent requests to complete.

For table generation it's not so clear, however.

related to discussion around hibernate#707

This seems right for sequences. I'm not sure about tables though.
//special case where we're not using blocking at all
return nextHiValue(session);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sanne
Copy link
Member

Sanne commented Apr 15, 2021

FWIW I had written a POC about a year ago to have such an optimizer, which would tune the page sized in an "adaptative" way.

I abandoned it as I needed to learn how to test such things with vertx and never had much time - but it sounds like conceptually you might want to revive the idea.

@gavinking
Copy link
Member Author

I mean in principle it would not be at all hard to increase the block size if we find ourselves requesting too many new blocks. Not hard at all.

@Sanne
Copy link
Member

Sanne commented Apr 15, 2021

Right - and there's several other smart things one can easily do in the reactive model, such as we could already trigger a block allocation before the last available ID has been handed out to clients.

So by observing the rate at which IDs are being consumed and the time it takes for block allocation tasks to complete it should be possible to:

  • prefetch ID blocks in advance
  • adapt the block sizes

One caveat of using dynamically allocated block sizes might be fragmentation; if there's multiple application nodes they still need to use a consistent strategy - should not be a problem when it's based on sequences (atomic) but if it's backed by some of the other models (table?) this is probably not a good idea.

@gavinking
Copy link
Member Author

@stuartwdouglas can you do us a favor and test this patch in the context of the Quarkus issue?

@DavideD
Copy link
Member

DavideD commented Apr 15, 2021

I think we need good tests for this area

Well, yeah, clearly. The problem is that we have to precisely no tests for concurrent operation in the HR project, and I'm not clear even on the right way to write them.

I think I have an idea, I will start prototyping something

@stuartwdouglas
Copy link
Contributor

This works now, instead of doing a couple of requests then hanging it can now handle 3.5k inserts per second (this is not tuned in any way, it could probably do a lot more if I messed with some settings).

@gavinking
Copy link
Member Author

Thanks 🙏

So perhaps it's worth doing a release just to fix this one thing. Thoughts?

@stuartwdouglas
Copy link
Contributor

I think so, its a pretty bad bug.

@stuartwdouglas
Copy link
Contributor

Actually I just realized that I am only testing the block size == 1 path, how do I configure a larger block size to test the other code path?

@gavinking
Copy link
Member Author

With the @SequenceGenerator or @TableGenerator annotation just like in regular JPA.

@DavideD
Copy link
Member

DavideD commented Apr 16, 2021

So perhaps it's worth doing a release just to fix this one thing. Thoughts?

Yes, we should release it. I will work on adding some more tests for the id generation but considering that @stuartwdouglas already tested on quarkus and that our current test suite works, releasing this version seems an important improvement for a release.

@stuartwdouglas
Copy link
Contributor

It has been more than 10 years since I last actually 'used' JPA rather than just dealing with integration stuff. So it looks like the quickstart sets allocationSize = 1 for some reason, if you remove it then performance increases, but we get the occasional PK violation:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.86ms   17.38ms 219.92ms   98.11%
    Req/Sec     8.11k     1.49k    9.41k    90.86%
  159144 requests in 10.10s, 15.69MB read
  Non-2xx or 3xx responses: 31

Looking at the logs some entities are being given the same PK.

@gavinking
Copy link
Member Author

we get the occasional PK violation

is this with a sequence generator or a table generator?

@stuartwdouglas
Copy link
Contributor

    @SequenceGenerator(name = "fruitsSequence", sequenceName = "known_fruits_id_seq", initialValue = 10)

@gavinking
Copy link
Member Author

gavinking commented Apr 16, 2021

Well that's very strange. Can you tell if it's because they are assigned (a) dupe hi values or (b) dupe lo values?

@gavinking
Copy link
Member Author

I mean:

  1. a dupe hi value is impossible because we're getting them from a database sequence, but
  2. a dupe lo value also looks impossible because lo values are assigned within a synchronized block.

So ... another silly logical error in my code??

@stuartwdouglas
Copy link
Contributor

The id's in the errors are a long way apart, e.g:
5161
6361
8561
13211

So it's not allocating duplicate blocks.

@stuartwdouglas
Copy link
Contributor

They all seem to end in '1'

@gavinking
Copy link
Member Author

another silly logical error in my code

Yup it's that, actually. Hold tight I will push a fix.

@gavinking
Copy link
Member Author

wait, all the ids, or just all the dupe ids?

@gavinking
Copy link
Member Author

Yup it's that, actually. Hold tight I will push a fix.

No, excuse me. What I thought was wrong is actually fine.

@gavinking
Copy link
Member Author

wait, all the ids, or just all the dupe ids?

No, you mean the ones in the errors. So by the looks, they're all the "second id generated after fetching a new hi value".

@gavinking
Copy link
Member Author

they're all the "second id generated after fetching a new hi value".

No, wait I guess you have initialValue=1, the default, so they're the first id's in their respective blocks.

@stuartwdouglas
Copy link
Contributor

stuartwdouglas commented Apr 16, 2021 via email

@gavinking
Copy link
Member Author

Do you think it could be an out-of-order execution thing??

Like, list.forEach(Runnable::run) gets executed before result.complete( next(id) ).

Or am I trying to make this too complicated again?

@gavinking
Copy link
Member Author

I mean, would moving the line list.forEach(Runnable::run) inside the synchronized block fix it?

@stuartwdouglas
Copy link
Contributor

Bugger, turns out this was user error. I forgot to clean before running quarkus:dev so it had cached the old maven dependencies, so it was not picking up the new version. It looks like this was a problem with the old version when the block size was larger than 1.

Sorry for the noise.

@gavinking
Copy link
Member Author

No problem. So how does it perform?

@stuartwdouglas
Copy link
Contributor

Comparatively way better, 13,429 inserts per sec. This is no no way a proper benchmark, it's only using 2 threads and 10 connections.

@gavinking
Copy link
Member Author

Alright. So that sounds good.

Shall I merge this?

@DavideD
Copy link
Member

DavideD commented Apr 16, 2021

Sounds good to me

@gavinking gavinking merged commit d27f7d7 into hibernate:main Apr 16, 2021
@DavideD
Copy link
Member

DavideD commented Apr 16, 2021

I've released it and sent a PR for quarkus: quarkusio/quarkus#16588

@DavideD
Copy link
Member

DavideD commented Apr 16, 2021

Thanks a lot @stuartwdouglas

@gavinking gavinking deleted the 707 branch June 15, 2023 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BlockingIdentifierGenerator doesn't use the right session
4 participants