Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the nvarchar-varbinary casting #3072

Merged

Conversation

pranavJ23
Copy link
Contributor

@pranavJ23 pranavJ23 commented Nov 4, 2024

Description

PROBLEM: while casting nvarchar to varbinary we were considering the encoding as input encoding in babelfish where as in tsql we use UTF16 encoding fir nvarchar irrespective of input encoding.

RCA: we were considering varchar and nvarchar as same, whereas we should use input encoding for varchar and UTF16 encoding for nvarchar.

FIX: So we need to identify that if the input is nvarchar then we will do the UTF16 encoding.

For a casting like nvarchar->varbinary->nvarchar, now since for the casting we are encoding the input string into UTF16 encoding via function nvarcharvarbinary, so while converting varbinary-> nvarchar we will use the function varbinarynvarchar where we will convert UTF16 encoding to UTF8 with null padding.
So we created a function nvarcharvarbinary and varbinarynvarchar to handle nvarchar<-> varbinary to and fro casting.
And for this casting we have specifically applied a condition that we will not convert the datatype to basetype before choosing the casting function

Issues Resolved

[BABEL-4891]
Signed-off-by: Pranav Jain [email protected]

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coveralls
Copy link
Collaborator

coveralls commented Nov 7, 2024

Pull Request Test Coverage Report for Build 12328936781

Details

  • 148 of 183 (80.87%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 74.855%

Changes Missing Coverage Covered Lines Changed/Added Lines %
contrib/babelfishpg_common/src/varbinary.c 83 91 91.21%
contrib/babelfishpg_common/src/varchar.c 40 67 59.7%
Totals Coverage Status
Change from base Build 12314534622: 0.02%
Covered Lines: 46661
Relevant Lines: 62335

💛 - Coveralls

@pranavJ23 pranavJ23 requested a review from Deepesh125 November 13, 2024 08:32
@pranavJ23 pranavJ23 requested a review from rohit01010 November 14, 2024 16:42
@pranavJ23 pranavJ23 changed the title differentiating nvarchar and varchar datatype while casting to varbinary Fix the nvarchar-varbinary casting Nov 22, 2024
Signed-off-by: Pranav Jain <[email protected]>
contrib/babelfishpg_common/src/collation.h Outdated Show resolved Hide resolved
contrib/babelfishpg_tsql/sql/datatype_string_operators.sql Outdated Show resolved Hide resolved
contrib/babelfishpg_tds/src/backend/tds/tdsxact.c Outdated Show resolved Hide resolved
contrib/babelfishpg_common/src/babelfishpg_common.c Outdated Show resolved Hide resolved
contrib/babelfishpg_common/src/varbinary.c Outdated Show resolved Hide resolved

if (encodedByteLen > maxlen)
encodedByteLen = maxlen;
result = (bytea *) palloc(encodedByteLen + VARHDRSZ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

palloc0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unresolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the palloc0 in nvarcharvarbinary, varcharvarbinary is a pre-existing function. Not my function

contrib/babelfishpg_common/src/varbinary.c Outdated Show resolved Hide resolved
Comment on lines +961 to +964
if(!(maxlen < 0 || (len >> 1) <= maxlen))
{
len = maxlen << 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is len >> 1 and maxlen << 1? is it due to *2 for UTF16?

Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>

if (encodedByteLen > maxlen)
encodedByteLen = maxlen;
result = (bytea *) palloc(encodedByteLen + VARHDRSZ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unresolved

Comment on lines +937 to +938
typmod = PG_GETARG_INT32(1);
maxlen = typmod - VARHDRSZ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be typmod in case of select cast(<> as nvarchar(max))?

MemoryContext ccxt = CurrentMemoryContext;

if (!isExplicit)
ereport(ERROR,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please point the code line

* If typmod is -1 (or invalid), use the actual length
* Length should be checked after encoding into server encoding
*/
if (typmod < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be if (typmod < (int32) VARHDRSZ)

Comment on lines 941 to 946
* Converts UTF-16 to UTF-8, handling odd-length inputs by padding.
* Respects maxlen if specified, otherwise processes full input.
* Uses TsqlUTF16toUTF8StringInfo for conversion, with error handling via PG_TRY.
*/

/* truncating NULL bytes from end */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation seems off

Comment on lines 951 to 958
if(len % 2 != 0)
{
paddedData = (char*)palloc0(len+1);
memcpy(paddedData, data, len);
len = len + 1;
}
else
memcpy(paddedData, data, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just do

memcpy(paddedData, data, len);
len = len % 2 ? len : len + 1;

You are already doing palloc at line during declaration

char 		*paddedData = (char*)palloc(len+1);

make it palloc0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this correct?

len = len % 2 ? len : len + 1;

it should be

len = len % 2 ? len + 1 : len;

{
/* Converts UTF-16 to UTF-8 using TsqlUTF16toUTF8StringInfo */
initStringInfo(&buf);
TsqlUTF16toUTF8StringInfo(&buf,paddedData,len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent: spaces between args

len= buf.len;

/* If typmod is -1 (or invalid), use the actual length */
if (typmod < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be < VARHDRSZ

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? as i made comment previously, we simply need typmod == -1

Copy link
Contributor

@tanscorpio7 tanscorpio7 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typmod < VARHDRSZ means we cannot even accommodate the header so we have to treat it as -1. We follow this in all the existing cast functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even postgres cast functions check typmod against VARHDRSZ

@@ -236,6 +236,7 @@ Function sys.binaryint2(sys.bbf_binary)
Function sys.binaryint4(sys.bbf_binary)
Function sys.binaryint8(sys.bbf_binary)
Function sys.binaryrowversion(sys.bbf_binary,integer,boolean)
Function sys.binarysysnvarchar(sys.bbf_binary,integer,boolean)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add dep tests?

@Deepesh125 Deepesh125 merged commit 3e2c11d into babelfish-for-postgresql:BABEL_4_X_DEV Dec 16, 2024
47 checks passed
pranavJ23 added a commit to pranavJ23/babelfish_extensions that referenced this pull request Dec 16, 2024
PROBLEM: while casting nvarchar to varbinary we were considering the UTF8 encoding as input encoding in Babelfish
where as in TSQL we use UTF16 encoding fir nvarchar irrespective of input encoding.

RCA: we were considering varchar and nvarchar as same, whereas we should use input encoding for varchar and UTF16
encoding for nvarchar.

FIX: So we need to identify that if the input is nvarchar then we will do the UTF16 encoding.

For a casting like nvarchar->varbinary->nvarchar, now since for the casting we are encoding the input string into UTF16
encoding via function nvarcharvarbinary, so while converting varbinary-> nvarchar we will use the function
varbinarynvarchar where we will convert UTF16 encoding to UTF8 with null padding.

So we created a function nvarcharvarbinary and varbinarynvarchar to handle nvarchar<-> varbinary to and fro casting.
And for this casting we have specifically applied a condition that we will not convert the datatype to basetype before choosing the casting function

Task: BABEL-4891
Signed-off-by: Pranav Jain <[email protected]>
timchang514 pushed a commit to amazon-aurora/babelfish_extensions that referenced this pull request Dec 17, 2024
PROBLEM: while casting nvarchar to varbinary we were considering the UTF8 encoding as input encoding in Babelfish
where as in TSQL we use UTF16 encoding fir nvarchar irrespective of input encoding.

RCA: we were considering varchar and nvarchar as same, whereas we should use input encoding for varchar and UTF16
encoding for nvarchar.

FIX: So we need to identify that if the input is nvarchar then we will do the UTF16 encoding.

For a casting like nvarchar->varbinary->nvarchar, now since for the casting we are encoding the input string into UTF16
encoding via function nvarcharvarbinary, so while converting varbinary-> nvarchar we will use the function
varbinarynvarchar where we will convert UTF16 encoding to UTF8 with null padding.

So we created a function nvarcharvarbinary and varbinarynvarchar to handle nvarchar<-> varbinary to and fro casting.
And for this casting we have specifically applied a condition that we will not convert the datatype to basetype before choosing the casting function

Task: BABEL-4891
Signed-off-by: Pranav Jain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants