Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the nvarchar-varbinary casting #3072

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
7c12ec2
fix
Nov 4, 2024
095673f
Merge branch 'babelfish-for-postgresql:BABEL_4_X_DEV' into babel_4891
pranavJ23 Nov 6, 2024
a2775e1
changes for varcharvarbinary function
Nov 7, 2024
c06c940
fix for hashbytes
Nov 11, 2024
6a2f3cc
Merge branch 'babelfish-for-postgresql:BABEL_4_X_DEV' into babel_4891
pranavJ23 Nov 11, 2024
bff7ec1
changes to support varchar and nvarchar difference in hashbytes
Nov 12, 2024
39791ff
adding to expected_dependency.out file
Nov 12, 2024
3a895f0
Merge branch 'babelfish-for-postgresql:BABEL_4_X_DEV' into babel_4891
pranavJ23 Nov 13, 2024
36e700d
adding test cases
Nov 13, 2024
8dc4e26
adding test cases and handle for reverse conversion from varbinary to…
Nov 16, 2024
ce7b7a8
Merge branch 'babelfish-for-postgresql:BABEL_4_X_DEV' into babel_4891
pranavJ23 Nov 19, 2024
75f9354
mfixing varbinary to nvarchar conversion
Nov 19, 2024
fcc59f4
minor refactoring
Nov 19, 2024
2b0a688
minor changes in varbinarynvarchar
Nov 19, 2024
30b176b
Merge branch 'babelfish-for-postgresql:BABEL_4_X_DEV' into babel_4891
pranavJ23 Nov 20, 2024
48d81fe
correcting the varbinarynvarchar conversion by adding padding before …
Nov 20, 2024
f044012
minor change
Nov 20, 2024
aafaf50
fixing utf16 to utf8 conversion for all lengths
Nov 20, 2024
01dc97f
simpler condition for padding
Nov 21, 2024
3cfc40d
simplifying condition
Nov 21, 2024
74aac96
fix test cases
Nov 21, 2024
e877e20
correcting test cases and upgrade script
Nov 21, 2024
2b2c59b
adding upgrade scripts for hashbytes and changing tests
Nov 21, 2024
44b5a11
minor changes
Nov 21, 2024
96b5b4a
naming fix
Nov 22, 2024
611dbca
Merge branch 'babelfish-for-postgresql:BABEL_4_X_DEV' into babel_4891
pranavJ23 Nov 22, 2024
5aab87a
minor fixes
Nov 26, 2024
c2bafef
Merge branch 'BABEL_4_X_DEV' into babel_4891
pranavJ23 Nov 26, 2024
0af5770
fix expected.out changes
Nov 26, 2024
635787c
Merge branch 'BABEL_4_X_DEV' into babel_4891
pranavJ23 Nov 26, 2024
1de8a82
refactoring
Nov 26, 2024
cd93a0e
refactoring
Nov 26, 2024
8df749e
minor changes
Nov 26, 2024
2ea5b2f
removing extra changes
Nov 26, 2024
68c1d26
minor change
Nov 26, 2024
b5a379c
addressing comment for UDT handling
Nov 28, 2024
c73a319
minor changes
Nov 28, 2024
5e69cfd
Merge branch 'babelfish-for-postgresql:BABEL_4_X_DEV' into babel_4891
pranavJ23 Dec 1, 2024
7cf58f7
addressing comments
Dec 1, 2024
794fe9e
addressing comments
Dec 1, 2024
4e21ac9
minor changes
Dec 1, 2024
e3d2f9c
fixing binary test cases
Dec 1, 2024
5c808a8
increasing test coverage
Dec 2, 2024
24be76d
adding tests to schedule tests
Dec 2, 2024
2313a4f
Merge branch 'BABEL_4_X_DEV' into babel_4891
pranavJ23 Dec 2, 2024
e995f9d
added hashbytes upgrade tests hence removing it from expected_depend…
Dec 2, 2024
09ff98a
simplifying the functions
Dec 2, 2024
af5eed7
minor changes
Dec 2, 2024
b2bc1e7
changing the binary tests and schedule files and introducing before f…
Dec 2, 2024
9615a99
fixing test cases
Dec 2, 2024
a9cf0aa
minor changes
Dec 3, 2024
c1250a0
addressing comments
Dec 3, 2024
ac06787
addressing comments
Dec 3, 2024
eeaa9ed
Merge branch 'babelfish-for-postgresql:BABEL_4_X_DEV' into babel_4891
pranavJ23 Dec 4, 2024
6483a51
addressing comments
Dec 4, 2024
04a8658
comments
Dec 4, 2024
d75f99c
minor changes
Dec 4, 2024
cc47ef7
restricting the get_UDT_immediate_basetype function call
pranavJ23 Dec 7, 2024
4c58e97
Merge branch 'BABEL_4_X_DEV' into babel_4891
pranavJ23 Dec 7, 2024
5e3e0c7
removing hashbytes implementation
pranavJ23 Dec 7, 2024
a8ef9ff
fixing hashbytes test cases
pranavJ23 Dec 7, 2024
9797e29
removce the UTf8<->UTF16 string info functtion to varchar.cand callin…
pranavJ23 Dec 9, 2024
0a09fa9
Merge branch 'BABEL_4_X_DEV' into babel_4891
pranavJ23 Dec 9, 2024
b4ad46f
optimising the collation_ptr call in tds
pranavJ23 Dec 9, 2024
e807c20
reverting the rendevous pointer changes
pranavJ23 Dec 9, 2024
ae35365
minor changes
pranavJ23 Dec 10, 2024
0935d58
addressing comments
pranavJ23 Dec 11, 2024
6ba1e89
minor changes
pranavJ23 Dec 11, 2024
330a50d
addressing comments
pranavJ23 Dec 11, 2024
8cfa08f
adding test cases
pranavJ23 Dec 11, 2024
abb498f
addressing comments
pranavJ23 Dec 12, 2024
e442c38
addressing comments
pranavJ23 Dec 12, 2024
aefe67e
Merge branch 'BABEL_4_X_DEV' into babel_4891
pranavJ23 Dec 12, 2024
4f367c4
addressing comments
pranavJ23 Dec 14, 2024
2eaede5
addressing comments
pranavJ23 Dec 14, 2024
fa79cf2
minor changes
pranavJ23 Dec 14, 2024
82709e1
adding dependency test cases
pranavJ23 Dec 14, 2024
6383080
Merge branch 'babelfish-for-postgresql:BABEL_4_X_DEV' into babel_4891
pranavJ23 Dec 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions contrib/babelfishpg_common/sql/binary.sql
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,11 @@ LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
CREATE CAST (sys.VARCHAR AS sys.BBF_BINARY)
WITH FUNCTION sys.varcharbinary (sys.VARCHAR, integer, boolean) AS ASSIGNMENT;

CREATE OR REPLACE FUNCTION sys.nvarcharbinary(sys.NVARCHAR, integer, boolean)
RETURNS sys.BBF_BINARY
AS 'babelfishpg_common', 'nvarcharbinary'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION sys.varcharbinary(pg_catalog.VARCHAR, integer, boolean)
RETURNS sys.BBF_BINARY
AS 'babelfishpg_common', 'varcharbinary'
Expand Down Expand Up @@ -99,6 +104,11 @@ LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
CREATE CAST (sys.BBF_BINARY AS sys.VARCHAR)
WITH FUNCTION sys.binarysysvarchar (sys.BBF_BINARY, integer, boolean) AS IMPLICIT;

CREATE OR REPLACE FUNCTION sys.binarysysnvarchar(sys.BBF_BINARY, integer, boolean)
RETURNS sys.NVARCHAR
AS 'babelfishpg_common', 'varbinarynvarchar'
Deepesh125 marked this conversation as resolved.
Show resolved Hide resolved
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION sys.binaryvarchar(sys.BBF_BINARY, integer, boolean)
RETURNS pg_catalog.VARCHAR
AS 'babelfishpg_common', 'varbinaryvarchar'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,26 @@

SELECT set_config('search_path', 'sys, '||current_setting('search_path'), false);

CREATE OR REPLACE FUNCTION sys.nvarcharvarbinary(sys.NVARCHAR, integer, boolean)
RETURNS sys.BBF_VARBINARY
AS 'babelfishpg_common', 'nvarcharvarbinary'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION sys.varbinarysysnvarchar(sys.BBF_VARBINARY, integer, boolean)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let use the existing naming convention for this function. Also please commit one line at end of each file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for varbinaryvarchar function too we had a function varbinarysysvarchar, I created naming likewise

RETURNS sys.NVARCHAR
AS 'babelfishpg_common', 'varbinarynvarchar'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION sys.binarysysnvarchar(sys.BBF_BINARY, integer, boolean)
RETURNS sys.NVARCHAR
AS 'babelfishpg_common', 'varbinarynvarchar'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION sys.nvarcharbinary(sys.NVARCHAR, integer, boolean)
RETURNS sys.BBF_BINARY
AS 'babelfishpg_common', 'nvarcharbinary'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION sys.smalldatetime_date_cmp(sys.SMALLDATETIME, date)
RETURNS INT4
AS 'timestamp_cmp_date'
Expand Down
10 changes: 10 additions & 0 deletions contrib/babelfishpg_common/sql/varbinary.sql
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@ LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
CREATE CAST (sys.BBF_VARBINARY AS pg_catalog.BYTEA)
WITH FUNCTION sys.varbinarybytea(sys.BBF_VARBINARY, integer, boolean) AS ASSIGNMENT;

CREATE OR REPLACE FUNCTION sys.nvarcharvarbinary(sys.NVARCHAR, integer, boolean)
RETURNS sys.BBF_VARBINARY
AS 'babelfishpg_common', 'nvarcharvarbinary'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION sys.varcharvarbinary(sys.VARCHAR, integer, boolean)
RETURNS sys.BBF_VARBINARY
AS 'babelfishpg_common', 'varcharvarbinary'
Expand Down Expand Up @@ -111,6 +116,11 @@ LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
CREATE CAST (sys.BBF_VARBINARY AS sys.VARCHAR)
WITH FUNCTION sys.varbinarysysvarchar (sys.BBF_VARBINARY, integer, boolean) AS IMPLICIT;

CREATE OR REPLACE FUNCTION sys.varbinarysysnvarchar(sys.BBF_VARBINARY, integer, boolean)
RETURNS sys.NVARCHAR
AS 'babelfishpg_common', 'varbinarynvarchar'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION sys.varbinaryvarchar(sys.BBF_VARBINARY, integer, boolean)
RETURNS pg_catalog.VARCHAR
AS 'babelfishpg_common', 'varbinaryvarchar'
Expand Down
201 changes: 201 additions & 0 deletions contrib/babelfishpg_common/src/varbinary.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,13 @@
#include "utils/pg_locale.h"
#include "utils/sortsupport.h"
#include "utils/varlena.h"
#include "lib/stringinfo.h"

#include "instr.h"
#include "logical.h"
#include "varchar.h"
#include "babelfishpg_common.h"
#include "typecode.h"

PG_FUNCTION_INFO_V1(varbinaryin);
PG_FUNCTION_INFO_V1(varbinaryout);
Expand All @@ -56,8 +60,11 @@ PG_FUNCTION_INFO_V1(varbinaryrowversion);
PG_FUNCTION_INFO_V1(rowversionbinary);
PG_FUNCTION_INFO_V1(rowversionvarbinary);
PG_FUNCTION_INFO_V1(varcharvarbinary);
PG_FUNCTION_INFO_V1(nvarcharvarbinary);
PG_FUNCTION_INFO_V1(bpcharvarbinary);
PG_FUNCTION_INFO_V1(nvarcharbinary);
PG_FUNCTION_INFO_V1(varbinaryvarchar);
PG_FUNCTION_INFO_V1(varbinarynvarchar);
PG_FUNCTION_INFO_V1(varcharbinary);
PG_FUNCTION_INFO_V1(bpcharbinary);
PG_FUNCTION_INFO_V1(varcharrowversion);
Expand Down Expand Up @@ -721,6 +728,84 @@ varcharvarbinary(PG_FUNCTION_ARGS)
PG_RETURN_BYTEA_P(result);
}

/*
* For nvarchar we need to convert the input string to UTF-16 encoding irrespective of input encoding
* So the source string is in UTF-8 encoding, we will convert it to UTF-16 encoding
*/
Datum
nvarcharvarbinary(PG_FUNCTION_ARGS)
{
VarChar *source = PG_GETARG_VARCHAR_PP(0);
char *data = VARDATA_ANY(source); /* Source string is UTF-8 */
char *encoded_data;
char *rp;
size_t len = VARSIZE_ANY_EXHDR(source);
int32 typmod = PG_GETARG_INT32(1);
bool isExplicit = PG_GETARG_BOOL(2);
int32 maxlen;
bytea *result;
int encodedByteLen;
StringInfoData buf;
MemoryContext ccxt = CurrentMemoryContext;

if (!isExplicit)
ereport(ERROR,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have test cases for this scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding one for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please point the code line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("Implicit conversion from data type nvarchar to "
"varbinary is not allowed. Use the CONVERT function "
"to run this query.")));

initStringInfo(&buf);
PG_TRY();
Deepesh125 marked this conversation as resolved.
Show resolved Hide resolved
{
/*
* For nvarchar convert the string to UTF16 from UTF8 irrespective of input encoding via TsqlUTF8toUTF16StringInfo()
* For this we need to prepare a StringInfoData() and assign the encoded_data,
* encodedByteLen from the string info data we prepared
*/
TsqlUTF8toUTF16StringInfo(&buf, data, len);
encoded_data = buf.data;
encodedByteLen= buf.len;
}
PG_CATCH();
{
MemoryContext ectx;
ErrorData *errorData;

ectx = MemoryContextSwitchTo(ccxt);
errorData = CopyErrorData();
FlushErrorState();
MemoryContextSwitchTo(ectx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Need to update Error message here.


ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("Failed to convert from data type nvarchar to varbinary, %s",
errorData->message)));
}
PG_END_TRY();

/*
* If typmod is -1 (or invalid), use the actual length
* Length should be checked after encoding into server encoding
*/
if (typmod < (int32) VARHDRSZ)
maxlen = encodedByteLen;
else
maxlen = typmod - VARHDRSZ;

if (encodedByteLen > maxlen)
encodedByteLen = maxlen;

result = (bytea *) palloc0(encodedByteLen + VARHDRSZ);
SET_VARSIZE(result, encodedByteLen + VARHDRSZ);

rp = VARDATA(result);
memcpy(rp, encoded_data, encodedByteLen);
pfree(buf.data);

PG_RETURN_BYTEA_P(result);
}

Datum
bpcharvarbinary(PG_FUNCTION_ARGS)
{
Expand Down Expand Up @@ -835,6 +920,79 @@ varbinaryvarchar(PG_FUNCTION_ARGS)
PG_RETURN_VARCHAR_P(result);
}

Datum
varbinarynvarchar(PG_FUNCTION_ARGS)
{
bytea *source = PG_GETARG_BYTEA_PP(0);
char *data = VARDATA_ANY(source);
VarChar *result;
char *encoded_result;
size_t len = VARSIZE_ANY_EXHDR(source);
int32 typmod = -1;
int maxlen = -1;
int encodedByteLen;
StringInfoData buf;
char *paddedData = (char*)palloc0(len+1);
MemoryContext ccxt = CurrentMemoryContext;

typmod = PG_GETARG_INT32(1);
maxlen = typmod - VARHDRSZ;
Comment on lines +938 to +939
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be typmod in case of select cast(<> as nvarchar(max))?


/*
* Converts UTF-16 to UTF-8, handling odd-length inputs by padding.
* Respects maxlen if specified, otherwise processes full input.
* Uses TsqlUTF16toUTF8StringInfo for conversion, with error handling via PG_TRY.
*/

/* truncating NULL bytes from end */
while(len>0 && data[len-1] == '\0')
Deepesh125 marked this conversation as resolved.
Show resolved Hide resolved
len -= 1;

/* Do the Padding if lenngth is odd */
memcpy(paddedData, data, len);
if(len % 2 != 0)
len = len + 1;

if(!(maxlen < 0 || (len >> 1) <= maxlen))
{
len = maxlen << 1;
}
Comment on lines +956 to +959
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz add a comment explaining this logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is len >> 1 and maxlen << 1? is it due to *2 for UTF16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


PG_TRY();
{
/* Converts UTF-16 to UTF-8 using TsqlUTF16toUTF8StringInfo */
initStringInfo(&buf);
TsqlUTF16toUTF8StringInfo(&buf, paddedData, len);
encoded_result = buf.data;
encodedByteLen= buf.len;
}


PG_CATCH();
{
MemoryContext ectx;
ErrorData *errorData;

ectx = MemoryContextSwitchTo(ccxt);
errorData = CopyErrorData();
FlushErrorState();
MemoryContextSwitchTo(ectx);

ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("Failed to convert from data type varbinary to nvarchar, %s",
errorData->message)));
Comment on lines +981 to +984
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz add test case for this

}
PG_END_TRY();

result = (VarChar *) cstring_to_text_with_len(encoded_result, encodedByteLen);
pfree(buf.data);
pfree(paddedData);

PG_RETURN_VARCHAR_P(result);
}


Datum
varcharbinary(PG_FUNCTION_ARGS)
{
Expand Down Expand Up @@ -874,6 +1032,49 @@ varcharbinary(PG_FUNCTION_ARGS)
PG_RETURN_BYTEA_P(result);
}

Datum
nvarcharbinary(PG_FUNCTION_ARGS)
{
VarChar *source = PG_GETARG_VARCHAR_PP(0);
char *data = VARDATA_ANY(source);
char *rp;
size_t len = VARSIZE_ANY_EXHDR(source);
Deepesh125 marked this conversation as resolved.
Show resolved Hide resolved
int32 typmod = PG_GETARG_INT32(1);
bool isExplicit = PG_GETARG_BOOL(2);
int32 maxlen;
bytea *result;
StringInfoData buf;

if (!isExplicit)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
errmsg("Implicit conversion from data type nvarchar to "
"binary is not allowed. Use the CONVERT function "
"to run this query.")));

initStringInfo(&buf);
TsqlUTF8toUTF16StringInfo(&buf, data, len);
data = buf.data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to free data first?

len= buf.len;

/* If typmod is -1 (or invalid), use the actual length */
if (typmod < (int32) VARHDRSZ)
Deepesh125 marked this conversation as resolved.
Show resolved Hide resolved
maxlen = len;
else
maxlen = typmod - VARHDRSZ;

if (len > maxlen)
len = maxlen;

result = (bytea *) palloc0(maxlen + VARHDRSZ);
SET_VARSIZE(result, maxlen + VARHDRSZ);

rp = VARDATA(result);
memcpy(rp, data, len);
pfree(buf.data);
PG_RETURN_BYTEA_P(result);
}

Datum
bpcharbinary(PG_FUNCTION_ARGS)
{
Expand Down
Loading
Loading