Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

400 Error: "bigquery" output format does not support key_field in aiplatform_v1.BatchPredictionJob.InstanceConfig #4514

Open
tetsu-i opened this issue Oct 7, 2024 · 1 comment
Assignees
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.

Comments

@tetsu-i
Copy link

tetsu-i commented Oct 7, 2024

Summary

I encountered the following error when trying to specify the key_field in aiplatform_v1.BatchPredictionJob.InstanceConfig with a BigQuery input:

google.api_core.exceptions.InvalidArgument: 400 "bigquery" output format does not support key_field.

Environment details

  • OS type and version: maxOS Sonoma 14.5
  • Python version: 3.11.10
  • google-cloud-aiplatform version: 1.69.0

Code example

from google.cloud import aiplatform, aiplatform_v1

LOCATION = "asia-northeast1"
MY_PROJECT = "my-project"

def batch_predict_with_bq(
    model: aiplatform.Model,
    job_display_name: str,
    bq_source_uri: str,
    bq_output_uri: str,
    machine_type: str,
) -> aiplatform_v1.BatchPredictionJob:
    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchPredictionJob.InputConfig
    input_config = aiplatform_v1.BatchPredictionJob.InputConfig(
        instances_format="bigquery",
        bigquery_source=aiplatform_v1.BigQuerySource(
            input_uri=bq_source_uri,
        ),
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchPredictionJob.InstanceConfig
    instance_config = aiplatform_v1.BatchPredictionJob.InstanceConfig(
        excluded_fields=["user_id"],
        key_field="key",
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchPredictionJob.OutputConfig
    output_config = aiplatform_v1.BatchPredictionJob.OutputConfig(
        predictions_format="bigquery",
        bigquery_destination=aiplatform_v1.BigQueryDestination(
            output_uri=bq_output_uri,
        ),
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchDedicatedResources
    batch_dedicated_resources = aiplatform_v1.BatchDedicatedResources(
        machine_spec=aiplatform_v1.MachineSpec(machine_type=machine_type),
        starting_replica_count=1,
        max_replica_count=1,
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchPredictionJob

    job = aiplatform_v1.BatchPredictionJob(
        name="test",
        display_name=job_display_name,
        model=model.resource_name,
        input_config=input_config,
        output_config=output_config,
        instance_config=instance_config,
        dedicated_resources=batch_dedicated_resources,
    )

    # https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.job_service.JobServiceClient#google_cloud_aiplatform_v1_services_job_service_JobServiceClient_create_batch_prediction_job
    client = aiplatform_v1.JobServiceClient(
        client_options={"api_endpoint": f"{LOCATION}-aiplatform.googleapis.com"}
    )

    request = aiplatform_v1.CreateBatchPredictionJobRequest(
        parent=f"projects/{MY_PROJECT}/locations/{LOCATION}",
        batch_prediction_job=job,
    )

    response = client.create_batch_prediction_job(request=request)

    return response

Stack trace

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

...
    _ = batch_predict_with_bq(
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/workspace/prog/python/vertexai/batch/main.py", line 305, in batch_predict_with_bq
    response = client.create_batch_prediction_job(request=request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/workspace/prog/python/vertexai/batch/.venv/lib/python3.11/site-packages/google/cloud/aiplatform_v1/services/job_service/client.py", line 3739, in create_batch_prediction_job
    response = rpc(
               ^^^^
  File "/Users/user/workspace/prog/python/vertexai/batch/.venv/lib/python3.11/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/workspace/prog/python/vertexai/batch/.venv/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 "bigquery" output format does not support key_field.

Expected Behavior

According to the documentation, it seems that specifying key_field with a bigquery input should be allowed, but the error indicates otherwise.

Actual Behavior

The job fails with a 400 error stating that the BigQuery output format does not support key_field, which contradicts the information in the documentation.

Additional Information

If key_field is not supported for the bigquery format, it would be helpful to update the documentation to reflect this limitation. Otherwise, any guidance on resolving this issue would be greatly appreciated.

Thanks!

@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Oct 7, 2024
@jaycee-li
Copy link
Contributor

Hi @weichungw , could you please take a look at this or assign to the right person on your team?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.
Projects
None yet
Development

No branches or pull requests

3 participants