-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the nvarchar-varbinary casting #3072
Fix the nvarchar-varbinary casting #3072
Conversation
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Pull Request Test Coverage Report for Build 12328936781Details
💛 - Coveralls |
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
… any other type Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
…converting UTF16 to UTF8 Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
Signed-off-by: Pranav Jain <[email protected]>
…g it via rendevouz variable Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
|
||
if (encodedByteLen > maxlen) | ||
encodedByteLen = maxlen; | ||
result = (bytea *) palloc(encodedByteLen + VARHDRSZ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
palloc0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unresolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the palloc0 in nvarcharvarbinary, varcharvarbinary is a pre-existing function. Not my function
if(!(maxlen < 0 || (len >> 1) <= maxlen)) | ||
{ | ||
len = maxlen << 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is len >> 1 and maxlen << 1? is it due to *2 for UTF16?
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
|
||
if (encodedByteLen > maxlen) | ||
encodedByteLen = maxlen; | ||
result = (bytea *) palloc(encodedByteLen + VARHDRSZ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unresolved
typmod = PG_GETARG_INT32(1); | ||
maxlen = typmod - VARHDRSZ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will be typmod in case of select cast(<> as nvarchar(max))
?
MemoryContext ccxt = CurrentMemoryContext; | ||
|
||
if (!isExplicit) | ||
ereport(ERROR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please point the code line
* If typmod is -1 (or invalid), use the actual length | ||
* Length should be checked after encoding into server encoding | ||
*/ | ||
if (typmod < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be if (typmod < (int32) VARHDRSZ)
* Converts UTF-16 to UTF-8, handling odd-length inputs by padding. | ||
* Respects maxlen if specified, otherwise processes full input. | ||
* Uses TsqlUTF16toUTF8StringInfo for conversion, with error handling via PG_TRY. | ||
*/ | ||
|
||
/* truncating NULL bytes from end */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation seems off
if(len % 2 != 0) | ||
{ | ||
paddedData = (char*)palloc0(len+1); | ||
memcpy(paddedData, data, len); | ||
len = len + 1; | ||
} | ||
else | ||
memcpy(paddedData, data, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just do
memcpy(paddedData, data, len);
len = len % 2 ? len : len + 1;
You are already doing palloc at line during declaration
char *paddedData = (char*)palloc(len+1);
make it palloc0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this correct?
len = len % 2 ? len : len + 1;
it should be
len = len % 2 ? len + 1 : len;
{ | ||
/* Converts UTF-16 to UTF-8 using TsqlUTF16toUTF8StringInfo */ | ||
initStringInfo(&buf); | ||
TsqlUTF16toUTF8StringInfo(&buf,paddedData,len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent: spaces between args
len= buf.len; | ||
|
||
/* If typmod is -1 (or invalid), use the actual length */ | ||
if (typmod < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be < VARHDRSZ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? as i made comment previously, we simply need typmod == -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typmod < VARHDRSZ
means we cannot even accommodate the header so we have to treat it as -1. We follow this in all the existing cast functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even postgres cast functions check typmod against VARHDRSZ
Signed-off-by: pranav jain <[email protected]>
@@ -236,6 +236,7 @@ Function sys.binaryint2(sys.bbf_binary) | |||
Function sys.binaryint4(sys.bbf_binary) | |||
Function sys.binaryint8(sys.bbf_binary) | |||
Function sys.binaryrowversion(sys.bbf_binary,integer,boolean) | |||
Function sys.binarysysnvarchar(sys.bbf_binary,integer,boolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add dep tests?
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
Signed-off-by: pranav jain <[email protected]>
3e2c11d
into
babelfish-for-postgresql:BABEL_4_X_DEV
PROBLEM: while casting nvarchar to varbinary we were considering the UTF8 encoding as input encoding in Babelfish where as in TSQL we use UTF16 encoding fir nvarchar irrespective of input encoding. RCA: we were considering varchar and nvarchar as same, whereas we should use input encoding for varchar and UTF16 encoding for nvarchar. FIX: So we need to identify that if the input is nvarchar then we will do the UTF16 encoding. For a casting like nvarchar->varbinary->nvarchar, now since for the casting we are encoding the input string into UTF16 encoding via function nvarcharvarbinary, so while converting varbinary-> nvarchar we will use the function varbinarynvarchar where we will convert UTF16 encoding to UTF8 with null padding. So we created a function nvarcharvarbinary and varbinarynvarchar to handle nvarchar<-> varbinary to and fro casting. And for this casting we have specifically applied a condition that we will not convert the datatype to basetype before choosing the casting function Task: BABEL-4891 Signed-off-by: Pranav Jain <[email protected]>
PROBLEM: while casting nvarchar to varbinary we were considering the UTF8 encoding as input encoding in Babelfish where as in TSQL we use UTF16 encoding fir nvarchar irrespective of input encoding. RCA: we were considering varchar and nvarchar as same, whereas we should use input encoding for varchar and UTF16 encoding for nvarchar. FIX: So we need to identify that if the input is nvarchar then we will do the UTF16 encoding. For a casting like nvarchar->varbinary->nvarchar, now since for the casting we are encoding the input string into UTF16 encoding via function nvarcharvarbinary, so while converting varbinary-> nvarchar we will use the function varbinarynvarchar where we will convert UTF16 encoding to UTF8 with null padding. So we created a function nvarcharvarbinary and varbinarynvarchar to handle nvarchar<-> varbinary to and fro casting. And for this casting we have specifically applied a condition that we will not convert the datatype to basetype before choosing the casting function Task: BABEL-4891 Signed-off-by: Pranav Jain <[email protected]>
Description
PROBLEM: while casting nvarchar to varbinary we were considering the encoding as input encoding in babelfish where as in tsql we use UTF16 encoding fir nvarchar irrespective of input encoding.
RCA: we were considering varchar and nvarchar as same, whereas we should use input encoding for varchar and UTF16 encoding for nvarchar.
FIX: So we need to identify that if the input is nvarchar then we will do the UTF16 encoding.
For a casting like nvarchar->varbinary->nvarchar, now since for the casting we are encoding the input string into UTF16 encoding via function nvarcharvarbinary, so while converting varbinary-> nvarchar we will use the function varbinarynvarchar where we will convert UTF16 encoding to UTF8 with null padding.
So we created a function nvarcharvarbinary and varbinarynvarchar to handle nvarchar<-> varbinary to and fro casting.
And for this casting we have specifically applied a condition that we will not convert the datatype to basetype before choosing the casting function
Issues Resolved
[BABEL-4891]
Signed-off-by: Pranav Jain [email protected]
Check List
By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.