Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating clustering fields is not possible #3655

Closed
paslandau opened this issue Jan 8, 2021 · 6 comments
Closed

Updating clustering fields is not possible #3655

paslandau opened this issue Jan 8, 2021 · 6 comments
Labels
api: bigquery Issues related to the BigQuery API. status: investigating The issue is under investigation, which is determined to be non-trivial.

Comments

@paslandau
Copy link

  • OS: alpine
  • PHP version: 7.3
  • Package name and version: google/cloud-bigquery v1.20.1

Hey,

According to the docs, updating the clustering definition should be possible:

Modifying clustering specification
By calling the tables.update or tables.patch methods, table clustering specifications can be changed or removed. The set of clustered columns in a clustered table can also be changed to a different set of columns. This method of updating the clustering column set is most useful for tables with continuous streaming insertions because those tables cannot easily be swapped otherwise.

[...]

Yet when I try to update a clustered table with new clustering fields, the operations keeps failing with the seemingly incorrect error message:

"Cannot add, update, or remove clustering without partitioning field."

Afaik it doesn't state anywhere that partitioning is required to use clustering (the docs even mention arguments when one should be used instead of the other).

Also, the table can be created with clustering but without partitioning.

Before I spent more time trying possible combinations of what might work, I'd like to verify that the feature is actually already supported and working as expected.

Code to reproduce

$bigQueryClient = new \Google\Cloud\BigQuery\BigQueryClient();

$dataset = $bigQueryClient->dataset("example_dataset");

if(!$dataset->exists()){
    $dataset = $bigQueryClient->createDataset("example_dataset");
}

$tableName = "example_table";

try {
    $table = $dataset->table($tableName);
    if($table->exists()){
        $table->delete();
    }
    // create without clustering
    $options["schema"] = [
        "fields" => [
            ["name" => 'foo', "type" => "STRING"],
        ],
    ];

    $table = $dataset->createTable($tableName, $options);

    // add clustering
    $options["clustering"] = [
        "fields" => [
            "foo",
        ],
    ];
    $table->update($options);
}catch(Throwable $t){
    echo $t->getMessage()."\n";
}

$options = [];

try {
    $table = $dataset->table($tableName);
    if($table->exists()){
        $table->delete();
    }
    // create WITH clustering
    $options["schema"] = [
        "fields" => [
            ["name" => 'foo', "type" => "STRING"],
            ["name" => 'bar', "type" => "STRING"],
        ],
    ];
    $options["clustering"] = [
        "fields" => [
            "foo",
        ],
    ];

    $table = $dataset->createTable($tableName, $options);

    // change clustering
    $options["clustering"] = [
        "fields" => [
            "bar",
        ],
    ];
    $table->update($options);
}catch(Throwable $t){
    echo $t->getMessage()."\n";
}

Output:

{
  "error": {
    "code": 400,
    "message": "Cannot add, update, or remove clustering without partitioning field.",
    "errors": [
      {
        "message": "Cannot add, update, or remove clustering without partitioning field.",
        "domain": "global",
        "reason": "invalid"
      }
    ],
    "status": "INVALID_ARGUMENT"
  }
}
@paslandau
Copy link
Author

FYI:
When using a field partitioned table, updating the clustering definition seems to be possible:

$bigQueryClient = $client->getBigQueryClient();

$dataset = $bigQueryClient->dataset("example_dataset");

if (!$dataset->exists()) {
    $dataset = $bigQueryClient->createDataset("example_dataset");
}

$tableName = "example_table_with_partitioning";
$options   = [];

try {
    $table = $dataset->table($tableName);
    if ($table->exists()) {
        $table->delete();
    }
    $options["schema"] = [
        "fields" => [
            ["name" => 'foo', "type" => "STRING"],
            ["name" => 'date', "type" => "DATE"],
        ],
    ];

    $options["timePartitioning"] = [
        "type"  => "DAY",
        "field" => "date",
    ];

    $table = $dataset->createTable($tableName, $options);
    echo "Created table WITH partitioning but WITHOUT clustering\n";
    var_dump($table->info()["clustering"] ?? []);

    $options["clustering"] = [
        "fields" => [
            "foo",
        ],
    ];
    echo "Updating the clustering fields\n";
    $table->update($options);

    var_dump($table->info()["clustering"] ?? []);
} catch (Throwable $t) {
    echo $t->getMessage()."\n";
}

Output

Created table WITH partitioning but WITHOUT clustering
array(0) {
}
Updating the clustering fields
array(1) {
  ["fields"]=>
  array(1) {
    [0]=>
    string(3) "foo"
  }
}

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jan 9, 2021
@dwsupplee dwsupplee added api: bigquery Issues related to the BigQuery API. status: investigating The issue is under investigation, which is determined to be non-trivial. and removed triage me I really want to be triaged. labels Jan 12, 2021
@alex-roboto
Copy link

alex-roboto commented Jan 30, 2021

There are two cases:

  1. You create a table with clustering, without partitions
  2. You create a table without partitions. You then attempt to add clustering to the table.

#1 works, because you are creating a table without partitions, but with clustering.

#2 does not work because as the error says, "Cannot add, update, or remove clustering without partitioning field." Note the error does NOT say "Cannot create, add, update, or remove clustering without a partitioning field."

You can create a table with clustering, no partitioning. But you can't add clustering to an unpartitioned table. (You can add clustering to a partitioned table.)

@paslandau
Copy link
Author

I undestand what the error says - I just want to make sure that it is the intended behavior:

Before I spent more time trying possible combinations of what might work, I'd like to verify that the feature is actually already supported and working as expected.

As I mentioned before, the docs don't state anywhere that "partitioning is required in order to update clustering".

If that is required for some reason- okay. But then it would be helpful if it says so in the documentation as the reason is (at least to me) not obvious.

@jdpedrie
Copy link
Contributor

jdpedrie commented Feb 3, 2021

Apologies for the delay. As you've encountered, you cannot modify clustering on tables without partitioning enabled. I've mentioned the documentation gap to the BigQuery team, so they might update the docs to reflect this.

@jdpedrie jdpedrie closed this as completed Feb 3, 2021
@paslandau
Copy link
Author

Short update: As of tonight the statement

"partitioning is required in order to update clustering"

does not seem to be true any longer. Our corresponding tests are not failing any longer.

@jdpedrie It would be great to get an official confirmation from your side regarding this.


The example code now yields

$bigQueryClient = $client->getBigQueryClient();

$dataset = $bigQueryClient->dataset("example_dataset");

if(!$dataset->exists()){
    $dataset = $bigQueryClient->createDataset("example_dataset");
}

$tableName = "example_table_new";

try {
    $table = $dataset->table($tableName);
    if($table->exists()){
        $table->delete();
    }
    // create without clustering
    $options["schema"] = [
        "fields" => [
            ["name" => 'foo', "type" => "STRING"],
        ],
    ];

    $table = $dataset->createTable($tableName, $options);
    echo "Created table WITHOUT partitioning AND  WITHOUT clustering\n";
    var_dump($table->info()["clustering"] ?? []);
    
    // add clustering
    $options["clustering"] = [
        "fields" => [
            "foo",
        ],
    ];
    $table->update($options);

    var_dump($table->info()["clustering"] ?? []);
}catch(Throwable $t){
    echo $t->getMessage()."\n";
}

$options = [];

try {
    $table = $dataset->table($tableName);
    if($table->exists()){
        $table->delete();
    }
    // create WITH clustering
    $options["schema"] = [
        "fields" => [
            ["name" => 'foo', "type" => "STRING"],
            ["name" => 'bar', "type" => "STRING"],
        ],
    ];
    $options["clustering"] = [
        "fields" => [
            "foo",
        ],
    ];

    $table = $dataset->createTable($tableName, $options);
    echo "Created table WITHOUT partitioning but WITH clustering\n";
    var_dump($table->info()["clustering"] ?? []);

    // change clustering
    $options["clustering"] = [
        "fields" => [
            "bar",
        ],
    ];
    $table->update($options);
    
    var_dump($table->info()["clustering"] ?? []);
}catch(Throwable $t){
    echo $t->getMessage()."\n";
}
Created table WITHOUT partitioning AND  WITHOUT clustering
array(0) {
}
array(1) {
  ["fields"]=>
  array(1) {
    [0]=>
    string(3) "foo"
  }
}
Created table WITHOUT partitioning but WITH clustering
array(1) {
  ["fields"]=>
  array(1) {
    [0]=>
    string(3) "foo"
  }
}
array(1) {
  ["fields"]=>
  array(1) {
    [0]=>
    string(3) "bar"
  }
}

@mattwelke
Copy link

Short update: As of tonight the statement

"partitioning is required in order to update clustering"

Same for me. I was able to add clustering to a table today that was created without clustering, and the table also did not have field partitioning enabled when it was created. It was created years ago before clustering or even field partitioning were supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. status: investigating The issue is under investigation, which is determined to be non-trivial.
Projects
None yet
Development

No branches or pull requests

6 participants