Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I specify CURRENT_TIMESTAMP for rows by Storage Write API? #531

Closed
6 tasks done
mikan3rd opened this issue Dec 12, 2024 · 7 comments · Fixed by #532
Closed
6 tasks done

Can I specify CURRENT_TIMESTAMP for rows by Storage Write API? #531

mikan3rd opened this issue Dec 12, 2024 · 7 comments · Fixed by #532
Assignees
Labels
api: bigquerystorage Issues related to the googleapis/nodejs-bigquery-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@mikan3rd
Copy link

Please make sure you have searched for information in the following guides.

Documentation Request

For example, for a query, we can insert CURRENT_TIMESTAMP as follows

INSERT INTO your_table_name (column1, column2, timestamp_column)
VALUES ('value1', 'value2', CURRENT_TIMESTAMP);

However, in the case of Storage Write API, passing the string “CURRENT_TIMESTAMP” to appendRows did not result in correct data.

Can I specify CURRENT_TIMESTAMP for rows by Storage Write API?

@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/nodejs-bigquery-storage API. label Dec 12, 2024
@mikan3rd
Copy link
Author

mikan3rd commented Dec 12, 2024

I also noticed a new problem.

If I set the default value of a column of type timestamp to CURRENT_TIMESTAMP, and then add data with that column unspecified using the Storage Write API, the value is always 1970-01-01 00:00:00 UTC.

If I do the same with an INSERT query, it works fine, and if I specify new Date() for the value, it works correctly.

Is this a bug?

@alvarowolfx alvarowolfx self-assigned this Dec 12, 2024
@alvarowolfx
Copy link
Contributor

@mikan3rd thanks for the report. The BigQuery Storage Write API doesn't support using SQL expressions for row values. An alternative for that is to rely on the column default value, which can be set to CURRENT_TIMESTAMP . Then you can configure defaultMissingValueInterpretation to DEFAULT_VALUE on the JSONWriter, so missing values will use the default value expression from the column. Based on your last comment, that might solve the issue since you already configured a default value for the column.

There is an integration test here with examples on how to use that:

it('should fill default values when MissingValuesInterpretation is set', async () => {

References

@alvarowolfx alvarowolfx added type: question Request for information or clarification. Not an issue. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Dec 12, 2024
@mikan3rd
Copy link
Author

Thanks for the reply.

I tried specifying defaultMissingValueInterpretation: 'DEFAULT_VALUE' to JSONWriter, but it does not work with the default value of CURRENT_TIMESTAMP(), which is specified as 1970-01-01 00:00:00 UTC, instead of the default value of CURRENT_TIMESTAMP().

Can you please check if there is a bug in the operation?

For reference, here is the code I am using.

[
  {
    "name": "memo",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "created_at",
    "type": "TIMESTAMP",
    "mode": "REQUIRED",
    "defaultValueExpression": "CURRENT_TIMESTAMP()"
  }
]
import { adapt, managedwriter, protos } from '@google-cloud/bigquery-storage';

const rowsList = [[
  {
    memo: "test",
  }
]]

const destinationTable = `projects/${projectId}/datasets/${datasetId}/tables/${tableId}`;
const writeClient = new WriterClient({ projectId });

try {
  const writeStream = await writeClient.getWriteStream({
    streamId: `${destinationTable}/streams/_default`,
    view: protos.google.cloud.bigquery.storage.v1.WriteStreamView.FULL,
  });

  const protoDescriptor = adapt.convertStorageSchemaToProto2Descriptor(writeStream.tableSchema!, 'root');

  const connection = await writeClient.createStreamConnection({
    streamId: managedwriter.DefaultStream,
    destinationTable,
  });

  const writer = new JSONWriter({
    connection,
    protoDescriptor,
    defaultMissingValueInterpretation: 'DEFAULT_VALUE',
  });

  const pendingWrites = rowsList.map((rows) => writer.appendRows(rows));

  await Promise.all(
    pendingWrites.map(async (pw) => {
      const result = await pw.getResult();
      if (result.error || result.rowErrors?.length) {
        throw new AppendRowsError(result);
      }
    }),
  );
} finally {
  if (writeClient.isOpen()) {
    writeClient.close();
  }
}

@alvarowolfx
Copy link
Contributor

The integration test that I pointed out tests a similar scenario. I ran it here and it fills the missing values with the CURRENT_TIMESTAMP and GENERATE_UUID values. Example output when running that integration test:

[{"customer_name":"Ada Lovelace","row_num":1,"id":"ae40d6f3-da43-461c-95b9-311b7834bcc3","created_at":{"value":"2024-12-13T15:53:32.836372000Z"},"updated_at":null},{"customer_name":"Alan Turing","row_num":2,"id":"68029923-f566-4955-a36c-7d569b9d036c","created_at":{"value":"2024-12-13T15:53:32.836416000Z"},"updated_at":null},{"customer_name":"Charles Babbage","row_num":3,"id":null,"created_at":null,"updated_at":{"value":"2024-12-13T15:53:33.042497000Z"}},{"customer_name":"Lord Byron","row_num":4,"id":null,"created_at":null,"updated_at":{"value":"2024-12-13T15:53:33.042535000Z"}}]

Are you using the latest version of the library @mikan3rd ? Can you double check if the table indeed has the defaultValueExpression set on the given column ? In the meantime I'm trying to find other things might cause that issue.

@mikan3rd
Copy link
Author

Thanks for the reply.

I checked the schema again and confirmed that the defaultValueExpression is set correctly and that the default value is set correctly when an INSERT query is executed.

However, the problem still persists when inserting data via the Storage Write API as in the code above, where the default value is not set correctly.

We also tried GENERATE_UUID() as well as CURRENT_TIMESTAMP(), but the correct uuid was not set, resulting in empty characters.

I would appreciate it if you could check the actual behavior as well as the test.

@alvarowolfx
Copy link
Contributor

@mikan3rd sorry the the late reply, I was out for the holidays. After some further investigation, I found that the issue was related to protobufJS setting undefined or null values to an empty string when the field has the REQUIRED label. Since the fields for your use case has the mode REQUIRED it gets converted to a proto descriptor with REQUIRED label, thus setting the value to empty string and the backend thinks there is a value set.

So I've opened PR #532 that sets the proto descriptor field to OPTIONAL when the column is REQUIRED and has a defaultValueExpression

@mikan3rd
Copy link
Author

mikan3rd commented Jan 9, 2025

@alvarowolfx
Thanks for the fix.
When and to which version will the fix be applied?
#534

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/nodejs-bigquery-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
2 participants