Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring: test_mdm_two_models_two_valid_configs failed #1673

Closed
flaky-bot bot opened this issue Sep 16, 2022 · 4 comments
Assignees
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@flaky-bot
Copy link

flaky-bot bot commented Sep 16, 2022

Note: #1583 was also for this test, but it was closed more than 10 days ago. So, I didn't mark it flaky.


commit: 95855a2
buildURL: Build Status, Sponge
status: failed

Test output
args = (parent: "projects/ucaip-sample-tests/locations/us-central1"
model_deployment_monitoring_job {
  display_name: "temp_e...     user_emails: ""
    }
    enable_logging: true
  }
  sample_predict_instance {
    null_value: NULL_VALUE
  }
}
,)
kwargs = {'metadata': [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')], 'timeout': 3600}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
      return callable_(*args, **kwargs)

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:67:


self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7fa9c5d00550>
request = parent: "projects/ucaip-sample-tests/locations/us-central1"
model_deployment_monitoring_job {
display_name: "temp_e2...
user_emails: ""
}
enable_logging: true
}
sample_predict_instance {
null_value: NULL_VALUE
}
}

timeout = 3600
metadata = [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')]
credentials = None, wait_for_ready = None, compression = None

def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)
  return _end_unary_response_blocking(state, call, False, None)

.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:946:


state = <grpc._channel._RPCState object at 0x7fa9c5d00670>
call = <grpc._cython.cygrpc.SegregatedCall object at 0x7fa9c73dd300>
with_call = False, deadline = None

def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:
      raise _InactiveRpcError(state)

E grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
E status = StatusCode.RESOURCE_EXHAUSTED
E details = "received initial metadata size exceeds limit"
E debug_error_string = "{"created":"@1663315347.076605531","description":"Error received from peer ipv4:172.253.117.95:443","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"
E >

.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:849: _InactiveRpcError

The above exception was the direct cause of the following exception:

self = <tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring object at 0x7fa9cdbbe310>

def test_mdm_two_models_two_valid_configs(self):
    [deployed_model1, deployed_model2] = list(
        map(lambda x: x.id, self.endpoint.list_models())
    )
    all_configs = {
        deployed_model1: objective_config,
        deployed_model2: objective_config2,
    }
    job = None
  job = aiplatform.ModelDeploymentMonitoringJob.create(
        display_name=self._make_display_name(key=JOB_NAME),
        logging_sampling_strategy=sampling_strategy,
        schedule_config=schedule_config,
        alert_config=alert_config,
        objective_configs=all_configs,
        create_request_timeout=3600,
        project=e2e_base._PROJECT,
        location=e2e_base._LOCATION,
        endpoint=self.endpoint,
        predict_instance_schema_uri="",
        analysis_instance_schema_uri="",
    )

tests/system/aiplatform/test_model_monitoring.py:185:


google/cloud/aiplatform/jobs.py:2355: in create
self._gca_resource = self.api_client.create_model_deployment_monitoring_job(
google/cloud/aiplatform_v1/services/job_service/client.py:3111: in create_model_deployment_monitoring_job
response = rpc(
.nox/system-3-8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py:145: in call
return wrapped_func(*args, **kwargs)
.nox/system-3-8/lib/python3.8/site-packages/google/api_core/timeout.py:102: in func_with_timeout
return func(*args, **kwargs)
.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:69: in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)


value = None
from_value = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "received initial m.../lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"

???
E google.api_core.exceptions.ResourceExhausted: 429 received initial metadata size exceeds limit

:3: ResourceExhausted

@flaky-bot flaky-bot bot added flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Sep 16, 2022
@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Sep 16, 2022
@flaky-bot
Copy link
Author

flaky-bot bot commented Sep 17, 2022

commit: 9a506ee
buildURL: Build Status, Sponge
status: failed

Test output
args = (parent: "projects/ucaip-sample-tests/locations/us-central1"
model_deployment_monitoring_job {
  display_name: "temp_e...     user_emails: ""
    }
    enable_logging: true
  }
  sample_predict_instance {
    null_value: NULL_VALUE
  }
}
,)
kwargs = {'metadata': [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')], 'timeout': 3600}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
      return callable_(*args, **kwargs)

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:67:


self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7f2fbe0294c0>
request = parent: "projects/ucaip-sample-tests/locations/us-central1"
model_deployment_monitoring_job {
display_name: "temp_e2...
user_emails: ""
}
enable_logging: true
}
sample_predict_instance {
null_value: NULL_VALUE
}
}

timeout = 3600
metadata = [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')]
credentials = None, wait_for_ready = None, compression = None

def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)
  return _end_unary_response_blocking(state, call, False, None)

.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:946:


state = <grpc._channel._RPCState object at 0x7f2fbe0291f0>
call = <grpc._cython.cygrpc.SegregatedCall object at 0x7f2fc5b5e140>
with_call = False, deadline = None

def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:
      raise _InactiveRpcError(state)

E grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
E status = StatusCode.RESOURCE_EXHAUSTED
E details = "received initial metadata size exceeds limit"
E debug_error_string = "{"created":"@1663370492.925193145","description":"Error received from peer ipv4:74.125.20.95:443","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"
E >

.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:849: _InactiveRpcError

The above exception was the direct cause of the following exception:

self = <tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring object at 0x7f2fc5d67310>

def test_mdm_two_models_two_valid_configs(self):
    [deployed_model1, deployed_model2] = list(
        map(lambda x: x.id, self.endpoint.list_models())
    )
    all_configs = {
        deployed_model1: objective_config,
        deployed_model2: objective_config2,
    }
    job = None
  job = aiplatform.ModelDeploymentMonitoringJob.create(
        display_name=self._make_display_name(key=JOB_NAME),
        logging_sampling_strategy=sampling_strategy,
        schedule_config=schedule_config,
        alert_config=alert_config,
        objective_configs=all_configs,
        create_request_timeout=3600,
        project=e2e_base._PROJECT,
        location=e2e_base._LOCATION,
        endpoint=self.endpoint,
        predict_instance_schema_uri="",
        analysis_instance_schema_uri="",
    )

tests/system/aiplatform/test_model_monitoring.py:185:


google/cloud/aiplatform/jobs.py:2355: in create
self._gca_resource = self.api_client.create_model_deployment_monitoring_job(
google/cloud/aiplatform_v1/services/job_service/client.py:3111: in create_model_deployment_monitoring_job
response = rpc(
.nox/system-3-8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py:145: in call
return wrapped_func(*args, **kwargs)
.nox/system-3-8/lib/python3.8/site-packages/google/api_core/timeout.py:102: in func_with_timeout
return func(*args, **kwargs)
.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:69: in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)


value = None
from_value = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "received initial m.../lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"

???
E google.api_core.exceptions.ResourceExhausted: 429 received initial metadata size exceeds limit

:3: ResourceExhausted

@rosiezou
Copy link
Contributor

rosiezou commented Sep 20, 2022

This failed due to resource exhaustion. I have manually removed some batch prediction jobs that haven't been deleted after sample code snippets finished testing, and it fixed the issue. Will keep an eye out for future resource exhaustions and file a bug if necessary.

@flaky-bot flaky-bot bot reopened this Sep 20, 2022
@flaky-bot
Copy link
Author

flaky-bot bot commented Sep 20, 2022

Looks like this issue is flaky. 😟

I'm going to leave this open and stop commenting.

A human should fix and close this.


commit: 9a506ee
buildURL: Build Status, Sponge
status: failed

Test output
args = (parent: "projects/ucaip-sample-tests/locations/us-central1"
model_deployment_monitoring_job {
  display_name: "temp_e...     user_emails: ""
    }
    enable_logging: true
  }
  sample_predict_instance {
    null_value: NULL_VALUE
  }
}
,)
kwargs = {'metadata': [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')], 'timeout': 3600}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
      return callable_(*args, **kwargs)

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:67:


self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7f28488061f0>
request = parent: "projects/ucaip-sample-tests/locations/us-central1"
model_deployment_monitoring_job {
display_name: "temp_e2...
user_emails: ""
}
enable_logging: true
}
sample_predict_instance {
null_value: NULL_VALUE
}
}

timeout = 3600
metadata = [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')]
credentials = None, wait_for_ready = None, compression = None

def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)
  return _end_unary_response_blocking(state, call, False, None)

.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:946:


state = <grpc._channel._RPCState object at 0x7f28489e5580>
call = <grpc._cython.cygrpc.SegregatedCall object at 0x7f284a1f9100>
with_call = False, deadline = None

def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:
      raise _InactiveRpcError(state)

E grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
E status = StatusCode.RESOURCE_EXHAUSTED
E details = "received initial metadata size exceeds limit"
E debug_error_string = "{"created":"@1663658455.987226763","description":"Error received from peer ipv4:74.125.199.95:443","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"
E >

.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:849: _InactiveRpcError

The above exception was the direct cause of the following exception:

self = <tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring object at 0x7f2850846be0>

def test_mdm_two_models_two_valid_configs(self):
    [deployed_model1, deployed_model2] = list(
        map(lambda x: x.id, self.endpoint.list_models())
    )
    all_configs = {
        deployed_model1: objective_config,
        deployed_model2: objective_config2,
    }
    job = None
  job = aiplatform.ModelDeploymentMonitoringJob.create(
        display_name=self._make_display_name(key=JOB_NAME),
        logging_sampling_strategy=sampling_strategy,
        schedule_config=schedule_config,
        alert_config=alert_config,
        objective_configs=all_configs,
        create_request_timeout=3600,
        project=e2e_base._PROJECT,
        location=e2e_base._LOCATION,
        endpoint=self.endpoint,
        predict_instance_schema_uri="",
        analysis_instance_schema_uri="",
    )

tests/system/aiplatform/test_model_monitoring.py:185:


google/cloud/aiplatform/jobs.py:2355: in create
self._gca_resource = self.api_client.create_model_deployment_monitoring_job(
google/cloud/aiplatform_v1/services/job_service/client.py:3111: in create_model_deployment_monitoring_job
response = rpc(
.nox/system-3-8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py:145: in call
return wrapped_func(*args, **kwargs)
.nox/system-3-8/lib/python3.8/site-packages/google/api_core/timeout.py:102: in func_with_timeout
return func(*args, **kwargs)
.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:69: in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)


value = None
from_value = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "received initial m.../lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"

???
E google.api_core.exceptions.ResourceExhausted: 429 received initial metadata size exceeds limit

:3: ResourceExhausted

@flaky-bot flaky-bot bot added the flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. label Sep 20, 2022
@rosiezou rosiezou self-assigned this Sep 22, 2022
@meredithslota meredithslota added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Sep 29, 2022
@rosiezou
Copy link
Contributor

This is fixed in #1671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

2 participants