-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
equals method in SegmentId looks buggy #8882
Comments
If you think this issue is down specifically to |
I'm also not certain that the equals method is correct as written The hashcode is calculated via: (comments removed for readability)
and the comment from equals:
That logic Take as a simple example, partitions x & y, where
While a full size counter example is constructed, I want to point out that this discussion could be obviated by adding an additional comparison to the equals method (checking the partition number too). The comparison of the hashcode can be left in for quicker exit still (since if the hashcodes are different, they are certainly different), but in the case of a hashcode collision all fields needs to be checked. |
Im no longer convinced that the logic is incorrect.
Taking a simple example with only a single other field, whose hashcode is k, we have
using the above becomes
since All else equal, I still think it makes sense to include the partition number comparison. Thinking through this level of modular arithmetic to prove an |
@michaelschiff thanks for looking into this. I did a simple math and also confirmed this should not be an issue. @leventov Code block for reference
|
Hi @ArvinZheng, thank you for the report. In #7491, a new assumption has been added that |
hi @jihoonson , thank you for looking into this. Yah, I just realized this limitation was introduced by the minor compaction. No, the assumption doesn't hold for our use case, as I described, we created our own Indexer based on core Druid libraries.
Hence we have large numeric values for our partition numbers even we may not have so many segments per interval. I understand it's not trivial to modify current minor compaction, but we should state this situation clearly on the release doc - https://github.com/apache/incubator-druid/releases/tag/druid-0.16.0-incubating, so that people can do some pre-checks on their system. Happy to discuss anything. :) |
Thanks for the suggestion. It sounds a good idea. I will update release doc sometime soon. |
Coordinator fails to distribute segments and suspect the equals method in SegmentId caused this issue.
Affected Version
0.16.0
Description
We got
exceptionMessage=Can't add chunk[org.apache.druid.timeline.partition.LinearPartitionChunk@1cb644f] to a full atomicUpdateGroup[AtomicUpdateGroup{chunks=[org.apache.druid.timeline.partition.LinearPartitionChunk@9a644f]}]}
error from Coordinator when we were trying upgrading our cluster from 0.13.0 to 0.16.0, refer Stack trace for details.To give more contexts here, we are using our own indexer which reads data from Kafka and builds segments. We use LinearShardSpec for creating our segments and maintain an internal logic to generate the partition number base on our data distribution strategy, the generated partition number per interval is not that consecutive, e.g. following partition numbers are generated for the same interval, and we may produce 2000 to 4000 segments in our production.
10112217, 100412240, 110112589, 120212564, 20110920, 30212707, 40112686, 50312650, 60212319, 70412079, 80212020, 90312664
Cluster size
We found this on our dev cluster which has only 3 historical nodes, 1 query broker and 1 coordinator.
Steps to reproduce the problem
Not able to reproduce the problem since we found this issue when we were trying to upgrade our
DEV
cluster to 0.16.0 and the segments may have been cleaned up.Any debugging that you have already done
VersionedIntervalTimeline
,OvershadowableManager
,AtomicUpdateGroup
andRootPartitionRange
classes, for the same interval, version, we should not have multiple segments which haveLinearShardSpec
belong to the sameAtomicUpdateGroup
, since theRootPartitionRange
forLinearShardSpec
is always (partitionNum, partitionNum+1), we should never have same partition number for the same interval (guaranteed by DB table constraint).doPoll()
method ofSQLMetadataSegmentManager
, every time when we query DB, we try to replace a segment by an existing one in previous snapshot if we can find it bySegmentId
. If there is any collision can be occurred for 2 different partition numbers (the rest factors are the same) fromcomputeHashCode()
method ofSegmentId
class, our segment poll is broken.cc: @michaelschiff
The text was updated successfully, but these errors were encountered: