Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/adbc/driver/snowflake: improved support for decimal128 types #1242

Closed
davidhcoe opened this issue Oct 31, 2023 · 8 comments · Fixed by #1267
Closed

go/adbc/driver/snowflake: improved support for decimal128 types #1242

davidhcoe opened this issue Oct 31, 2023 · 8 comments · Fixed by #1267

Comments

@davidhcoe
Copy link
Contributor

We are still having issues with decimals even with high precision enabled. Consider the following scenarios:

CREATE OR REPLACE TABLE PUBLIC.NUMBERTEST_ALLPRECISION1 
( 
COL18 NUMBER(18,0),COL19 NUMBER(19,0),COL20 NUMBER(20,0),COL21 NUMBER(21,0),COL22 NUMBER(22,0),COL23 NUMBER(23,0),COL24 NUMBER(24,0),COL25 NUMBER(25,0),COL26 NUMBER(26,0),COL27 NUMBER(27,0),COL28 NUMBER(28,0),COL29 NUMBER(29,0),COL30 NUMBER(30,0),COL31 NUMBER(31,0),COL32 NUMBER(32,0),COL33 NUMBER(33,0),COL34 NUMBER(34,0),COL35 NUMBER(35,0),COL36 NUMBER(36,0),COL37 NUMBER(37,0),COL38 NUMBER(38,0) 
); 

INSERT INTO NUMBERTEST_ALLPRECISION1 
( 
  COL18,COL19, COL20, COL21, COL22, COL23, COL24, COL25, COL26, COL27, COL28, COL29, COL30, COL31, COL32, COL33, COL34, COL35, COL36, COL37, COL38 
) 
VALUES
( 
999999999999999999, 9999999999999999999, 99999999999999999999, 999999999999999999999, 9999999999999999999999, 99999999999999999999999, 999999999999999999999999, 9999999999999999999999999, 99999999999999999999999999, 999999999999999999999999999, 9999999999999999999999999999, 99999999999999999999999999999, 999999999999999999999999999999, 9999999999999999999999999999999, 99999999999999999999999999999999, 999999999999999999999999999999999, 9999999999999999999999999999999999, 99999999999999999999999999999999999, 999999999999999999999999999999999999, 9999999999999999999999999999999999999,99999999999999999999999999999999999999 
); 

When this table is queried, the error from https://github.com/apache/arrow/blob/2628d495ca99a892dca50019f4e72f087dc5aac7/go/arrow/compute/internal/kernels/numeric_cast.go#L230 is hit, because COL18 has a precision+scale value of less than 19 (per https://github.com/apache/arrow/blob/c49e24273160ac1ce195f02dbd14acd7d0f6945e/go/arrow/compute/internal/kernels/helpers.go#L701).

The other scenario is:

CREATE OR REPLACE TABLE PUBLIC.NUMBERTEST_ALLPRECISION1 
( 
COL18 NUMBER(18,2),COL19 NUMBER(19,2),COL20 NUMBER(20,2),COL21 NUMBER(21,2),COL22 NUMBER(22,02),COL23 NUMBER(23,02),COL24 NUMBER(24,02),COL25 NUMBER(25,02),COL26 NUMBER(26,02),COL27 NUMBER(27,02),COL28 NUMBER(28,02),COL29 NUMBER(29,02),COL30 NUMBER(30,02),COL31 NUMBER(31,02),COL32 NUMBER(32,02),COL33 NUMBER(33,02),COL34 NUMBER(34,02),COL35 NUMBER(35,02),COL36 NUMBER(36,02),COL37 NUMBER(37,02),COL38 NUMBER(38,02) 
); 

INSERT INTO NUMBERTEST_ALLPRECISION1 
( 
COL18,COL19, COL20, COL21, COL22, COL23, COL24, COL25, COL26, COL27, COL28, COL29, COL30, COL31, COL32, COL33, COL34, COL35, COL36, COL37, COL38 
)
VALUES
( 
9999999999999999.99, 99999999999999999.99, 999999999999999999.99, 9999999999999999999.99, 99999999999999999999.99, 999999999999999999999.99, 9999999999999999999999.99, 99999999999999999999999.99, 999999999999999999999999.99, 9999999999999999999999999.99, 99999999999999999999999999.99, 999999999999999999999999999.99, 9999999999999999999999999999.99, 99999999999999999999999999999.99, 999999999999999999999999999999.99, 9999999999999999999999999999999.99, 99999999999999999999999999999999.99, 999999999999999999999999999999999.99, 9999999999999999999999999999999999.99, 99999999999999999999999999999999999.99,999999999999999999999999999999999999.99 
); 

When this table is queried, https://github.com/apache/arrow/blob/c49e24273160ac1ce195f02dbd14acd7d0f6945e/go/arrow/compute/internal/kernels/helpers.go#L592 is hit and cannot return values.

@lidavidm
Copy link
Member

lidavidm commented Nov 1, 2023

@zeroshade would these be primarily upstream issues?

@CurtHagenlocher
Copy link
Contributor

In at least one case, the column is typed as NUMBER(18, 2) and the array returned from Snowflake is typed as INT64. Obviously, INT64 values aren't guaranteed to fit into a NUMBER(18, 2) so the conversion code fails early. But it happens that all the values returned from Snowflake in that array do actually fit.

@zeroshade
Copy link
Member

In the case of hitting https://github.com/apache/arrow/blob/c49e24273160ac1ce195f02dbd14acd7d0f6945e/go/arrow/compute/internal/kernels/helpers.go#L592 it should only hit that if we're actually hitting the case where the value itself falls outside the range (since it should only error by actually checking that the value didn't fit), but for the early failure due to the scale and precision I guess we can add an option upstream to allow a consumer to request it to bypass that validation and let it just assume that everything will fit?

I'm not sure how I feel about that and I wish that snowflake would just return the correct type in the first place, but I doubt we'd be able to convince them to change how they return the data....

@davidhcoe
Copy link
Contributor Author

davidhcoe commented Nov 2, 2023

In the case of hitting https://github.com/apache/arrow/blob/c49e24273160ac1ce195f02dbd14acd7d0f6945e/go/arrow/compute/internal/kernels/helpers.go#L592 it should only hit that if we're actually hitting the case where the value itself falls outside the range (since it should only error by actually checking that the value didn't fit), but for the early failure due to the scale and precision I guess we can add an option upstream to allow a consumer to request it to bypass that validation and let it just assume that everything will fit?

I'm not sure how I feel about that and I wish that snowflake would just return the correct type in the first place, but I doubt we'd be able to convince them to change how they return the data....

Right. I tried adding the logic below to check whether useHighPrecision could be used given the precision and scale, and the current answer is false, so it falls to the logic of trying to do the math on the larger values that don't fit. So that didn't turn out to be a good idea.

func canUseHighPrecision(useHighPrecision bool, precision, scale int32) bool {
	if !useHighPrecision {
		return false
	}
	//precision <(19+scale) throws an error for Decimal128Type
	minPrecision := int32(19) // arrow.INT64
	minPrecision += scale
	if precision < minPrecision {
		return false
	} else {
		return true
	}
}

@CurtHagenlocher
Copy link
Contributor

There's a fair amount of logic in https://github.com/snowflakedb/gosnowflake/blob/master/converter.go which simply isn't reflected in the ADBC driver. Specifically, Snowflake will use lower-precision integer types to return lower-precision decimal values e.g. a decimal(9, 2) could be returned as an Arrow int32 and would need to be widened and not just converted as-is. This affects both the high-precision and low-precision code paths.

I can take a look at addressing this, but it will be the first time I've written Go ;).

@zeroshade
Copy link
Member

I'm currently travelling for a conference and unable to dedicate time to attempting to fix this directly myself. But if you want to take a stab at addressing it I'll happily review the code next week!

@CurtHagenlocher
Copy link
Contributor

I've been looking at this intermittently over the weekend and trying to pick up some Go by looking at a simpler issue first. I know what needs to be done mechanically, but I'm still trying to figure out how the compute kernels work and whether I can pass a custom function instead of relying on a predefined kernel as I think that the functionality this needs is a bit too specialized to justify putting into Arrow itself (and don't want to wait for a new Arrow release).

@zeroshade
Copy link
Member

In the Arrow Documentation https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/arrow/compute#example-package-CustomFunction provides an example on constructing, registering, and calling a custom function through the compute package. That should help, feel free to reach out to me if you need any more assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants