-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][pip] PIP-393: Improve performance of Negative Acknowledgement #23601
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
I add the space complexity analysis of the new data structure, please review it again, thanks. @lhotari @nodece @BewareMyPower @poorbarcode @codelipenghui @dao-jun |
Great analysis @thetumbled . Please move the analysis from the PR description to the PIP document itself. One small detail (which doesn't impact the analysis or solution): "Entry id is stored in a Roaring64Bitmap, for simplicity we can replace it with RoaringBitmap, as the max entry id is 49999, which is smaller than 65535." |
@thetumbled The title of any PR containing PIP documentation should include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PIP-393 document should include the high level plan of avoiding to increase the size of the Pulsar client by the size of fastutil jar file. The fastutil jar file is very large, 23MB. We use only a few classes of fastutil. There's fastutil-core library which is smaller, about ≅6MB. However, that is also relatively large and using fastutil-core will introduce another problem on the broker side since there's already fastutil jar which also includes fastutil-core jar classes. It's necessary to design a proper shading solution as part of this PIP design and implementation.
More details in the thread #23600 (comment)
You are right, not 65535, but 4294967296 (2 * Integer.MAX_VALUE). |
Thanks for review, i add it in high level design. |
The vote is completed, please review this pr again, thanks. @lhotari @nodece @eolivelli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for driving the effort @thetumbled. Great work!
release labels shouldn't be added to PIP document PRs since we only maintain PIP documents in the master branch. release labels are used for cherry-picking. |
PIP: 393
Implementation PR: #23600.
Motivation
There are many issues with the current implementation of Negative Acknowledgement in Pulsar:
All of these problem is severe and need to be solved.
Modifications
Refactor the
NegativeAcksTracker
to solve the above problems.Space complexity of new data structure
I will show you how great the new data structure it is with theorectical space complexity analysis.
Space complexity of
ConcurrentLongLongPairHashMap
Before analyzing the new data structure, we need to know how much space it take before this pip. We need to store 4 long field for
(ledgerId, entryId, partitionIndex, timestamp)
for each entry, which takes 4*8=32byte.As
ConcurrentLongLongPairHashMap
use open hash addressing and linear probe to handle hash confliction, there are rebundunt spaces to avoid high confliction rate. There are two configurations that control how much rebundunt space to reserver:fill factor
andidle factor
. When the space utility rate soar high tofill factor
, the size of backing array will be double, when the space utility rate reduce toidle factor
, the size of backing array will reduce by half.The default value of
fill factor
is 0.66,idle factor
is 0.15, which means the min space occupation ofConcurrentLongLongPairHashMap
is32/0.66N byte = 48N byte
, the max space occupation is32/0.15N byte=213N byte
, where N is the number of entries.List some test data to verify this:
There are 100w entries in the map, which take up
32*1000000/1024/1024byte=30MB
, the space utility rate is 30/64=0.46, in the range of[0.15, 0.66]
.Space complexity of new data structure
New data structure:
The space used by new data structure is related to several factors:
message rate
,the time deviation user accepted
,the max entries written in one ledger
.managedLedgerMaxEntriesPerLedger=50000
determine the max entries can be wriitten into one ledger, we use the default value to analyze.the time deviation user accepted
: when user accept 1024ms delivery time deviation, we can trim the lower 10 bit of the timestamp in ms, which can bucket 1024 timestamp.We will analyze the space used by one bucket, and calculate the average space used by one entry.
Assuming that the message rate is
x msg/ms
, and we trimy bit
of the timestamp, one bucket will contains2**x
ms,M=2**x*y
msgs in one bucket.managedLedgerMaxEntriesPerLedger
), the ledger will switch. There areL=ceil(M/50000)
ledgers, which take8*L
byte.L=ceil(M/50000)
ledgers, there will beL
bitmap to store, which take L*size(bitmap). The total space consumed by new data structure is8byte + 8L byte + L*size(bitmap)
.As the
size(bitmap)
is far more greater than8byte
, we can ignore the first two items. Then we get the formular of space consumed one bucket:D=L*size(bitmap)=ceil(M/50000)*size(bitmap)
.Entry id is stored in a
Roaring64Bitmap
, for simplicity we can replace it withRoaringBitmap
, as the max entry id is 49999, which is smaller than4294967296 (2 * Integer.MAX_VALUE)
(the max value can be stored inRoaringBitmap
). The space consume byRoaringBitmap
depends on how many elements it contains, when the size of bitmap < 4096, the space is4N btye
, when the size of bitmap > 4096, the consumed space is a fixed value 8KB.Then we get the final result:
D = ceil(M/50000)*size(bitmap) ~= M/50000 * 8KB = M/50000 * 8 * 1024 byte = 0.163M byte
, each entry takes0.163byte
by average.D = ceil(M/50000)*size(bitmap) = 1 * 8KB = 8KB
, each entry takes8*1024/M=8192/M byte
by average.D = ceil(M/50000)*size(bitmap) = 1 * 4Mbyte = 4Mbyte
, each entry take4 byte
by average.Conclusion
ConcurrentLongLongPairHashMap
is48N
byte in best case,213N byte
in worst case, where N is the number of entries.0.163N
byte.8192/M * N byte
.4N byte
.test data
List some experiment data to verify the analysis above.
Test code:
x=1, y=10
Let x=1, that is 1msg/ms, y=10, we will trim 10 bit of the timestamp. Then M=1*2**10=1024<4096. According to the reslut above, we predict that the space consume by 100w entries is
4*1000000/1024/1024=3.81MB
.The actual space consumed is
3.35MB
, which is quite near to the theorectical value.x=50, y=10
We try to reach to the best space complexity.
M=50*2**10=51200>50000
, we predict that average space consume by one entry is 0.163 byte.But the experiment result is
0.33*1024*1024/1000000=0.34byte
, almost twice of the theorectal value0.163
.We can print the size of bitmap to know why.
There are still many bitmaps whose size is far more smaller than 5w, which result into the lower space utility rate.
x=500, y=10
All bitmaps contains almost 5w entries.
Each entry take
0.18*1024*1024/1000000=0.18byte
, which is quite near to the the theorectical value.Documentation
doc
doc-required
doc-not-needed
doc-complete