Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HyperLogLogPlusPlus fails on serializeToByteArray #12

Open
shnapz opened this issue Jun 27, 2023 · 3 comments
Open

HyperLogLogPlusPlus fails on serializeToByteArray #12

shnapz opened this issue Jun 27, 2023 · 3 comments

Comments

@shnapz
Copy link

shnapz commented Jun 27, 2023

In Scio we have the following error thrown on version 0.1.0. Unfortunately it is unclear how to reproduce, and I can't provide input data. But stack trace suggests that there is something wrong with management of array slices in com.google.zetasketch.internal.GrowingByteSlice:

16:04:53.990  [info]   java.lang.IllegalArgumentException: 19 > 5
16:04:53.990  [info]   at Due to Exception while trying to `encode` an instance of com.spotify.data.dm.sources.internal.AudienceEngagementMetrics: Can't encode field listeners value com.spotify.scio.extra.hll.zetasketch.ZetaSketchHll@151901cb.(:0)
16:04:53.990  [info]   at java.base/java.util.Arrays.copyOfRange(Arrays.java:4029)
16:04:53.990  [info]   at com.google.zetasketch.internal.GrowingByteSlice.maybeExtendLimit(GrowingByteSlice.java:219)
16:04:53.990  [info]   at com.google.zetasketch.internal.GrowingByteSlice.putNextVarInt(GrowingByteSlice.java:191)
16:04:53.990  [info]   at com.google.zetasketch.internal.GrowingByteSlice.putNextVarInt(GrowingByteSlice.java:30)
16:04:53.990  [info]   at com.google.zetasketch.internal.DifferenceEncoder.putInt(DifferenceEncoder.java:54)
16:04:53.990  [info]   at com.google.zetasketch.internal.hllplus.SparseRepresentation.set(SparseRepresentation.java:424)
16:04:53.990  [info]   at com.google.zetasketch.internal.hllplus.SparseRepresentation.flushBuffer(SparseRepresentation.java:348)
16:04:53.990  [info]   at com.google.zetasketch.internal.hllplus.SparseRepresentation.compact(SparseRepresentation.java:243)
16:04:53.990  [info]   at com.google.zetasketch.HyperLogLogPlusPlus.serializeToByteArray(HyperLogLogPlusPlus.java:298)
16:04:53.990  [info]   at com.spotify.scio.extra.hll.zetasketch.ZetaSketchHll$$anonfun$coder$2.apply(ZetaSketchHLL.scala:121)
16:04:53.990  [info]   at com.spotify.scio.extra.hll.zetasketch.ZetaSketchHll$$anonfun$coder$2.apply(ZetaSketchHLL.scala:121)

So it fails on array = Arrays.copyOfRange(array, arrayOffset(), Math.max(growthCapacity, limit)); and there might be something wrong with Math.max(growthCapacity, limit), apparently it is out of boundaries

@zfraa
Copy link
Member

zfraa commented Jun 28, 2023

Thank you for the report, we'll have a look.

@AndersonReyes
Copy link

to add more context

looks like it only happens when running tests via sbt and the tests only add one to 3 values to the sketch. Production code running with no issues (noticeable failures anyways)

@AndersonReyes
Copy link

digging some more and getting random errors i see this as well

[info]   java.util.NoSuchElementException:
[info]   at Due to Exception while trying to `encode` an instance of com.spotify.data.dm.sources.internal.AudienceEngagementMetrics: Can't encode field listeners value com.spotify.scio.extra.hll.zetasketch.ZetaSketchHll@3ae0e410.(:0)
[info]   at com.google.zetasketch.internal.DifferenceDecoder.nextInt(DifferenceDecoder.java:49)
[info]   at com.google.zetasketch.internal.MergedIntIterator.advanceA(MergedIntIterator.java:51)
[info]   at com.google.zetasketch.internal.MergedIntIterator.<init>(MergedIntIterator.java:42)
[info]   at com.google.zetasketch.internal.hllplus.SparseRepresentation.dedupedIterator(SparseRepresentation.java:362)
[info]   at com.google.zetasketch.internal.hllplus.SparseRepresentation.flushBuffer(SparseRepresentation.java:348)
[info]   at com.google.zetasketch.internal.hllplus.SparseRepresentation.compact(SparseRepresentation.java:243)
[info]   at com.google.zetasketch.HyperLogLogPlusPlus.serializeToByteArray(HyperLogLogPlusPlus.java:298)
[info]   at com.spotify.scio.extra.hll.zetasketch.ZetaSketchHll$$anonfun$coder$2.apply(ZetaSketchHLL.scala:121)
[info]   at com.spotify.scio.extra.hll.zetasketch.ZetaSketchHll$$anonfun$coder$2.apply(ZetaSketchHLL.scala:121)
[info]   at com.spotify.scio.coders.TransformCoder.encode(Coder.scala:374)
[info]   at com.spotify.scio.coders.RecordCoder$$anonfun$encode$2.apply$mcV$sp(Coder.scala:263)
[info]   at com.spotify.scio.coders.RecordCoder$$anonfun$encode$2.apply(Coder.scala:263)
[info]   at com.spotify.scio.coders.RecordCoder$$anonfun$encode$2.apply(Coder.scala:263)
[info]   at com.spotify.scio.coders.RecordCoder.onErrorMsg(Coder.scala:249)
[info]   at com.spotify.scio.coders.RecordCoder.encode(Coder.scala:263)
[info]   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)

as well which means somehow for some reason some runs show up with

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants