Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Add CI job to roundtrip avro over pyspark #733

Open
jorgecarleitao opened this issue Jan 5, 2022 · 3 comments
Open

Add CI job to roundtrip avro over pyspark #733

jorgecarleitao opened this issue Jan 5, 2022 · 3 comments
Labels
testing PRs that only increase coverage

Comments

@jorgecarleitao
Copy link
Owner

So that we can increase our confidence that our implementation is correct.

@jorgecarleitao jorgecarleitao added the testing PRs that only increase coverage label Jan 5, 2022
@potter420
Copy link
Contributor

potter420 commented Feb 13, 2022

Seem like we're missing some Decimal Type. Can I add it ?
I will follow spec as stated here
https://avro.apache.org/docs/current/spec.html#Decimal

The byte array must contain the two's-complement representation of the unscaled integer value in big-endian byte order

@jorgecarleitao
Copy link
Owner Author

Thank you! Note that we have something similar in reading the parquet implementation that we may use - I think it uses an equivalent encoding.

@potter420
Copy link
Contributor

potter420 commented Feb 13, 2022

The deserialize part is complete, however, the write part seem tricky. What would you like to write, fixed or bytes.
fixed is easy since it fixed.
bytes is what spark default to when writing since it is java BigDecimal. I would take the sample implementation from here
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/math/BigInteger.java

public byte[] toByteArray() {
        int byteLen = bitLength()/8 + 1;
        byte[] byteArray = new byte[byteLen];

        for (int i=byteLen-1, bytesCopied=4, nextInt=0, intIndex=0; i>=0; i--) {
            if (bytesCopied == 4) {
                nextInt = getInt(intIndex++);
                bytesCopied = 1;
            } else {
                nextInt >>>= 8;
                bytesCopied++;
            }
            byteArray[i] = (byte)nextInt;
        }
        return byteArray;
    }

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing PRs that only increase coverage
Projects
None yet
Development

No branches or pull requests

2 participants