tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring: test_mdm_two_models_two_valid_configs failed #1673

flaky-bot · 2022-09-16T11:13:23Z

Note: #1583 was also for this test, but it was closed more than 10 days ago. So, I didn't mark it flaky.

commit: 95855a2
buildURL: Build Status, Sponge
status: failed

Test output

args = (parent: "projects/ucaip-sample-tests/locations/us-central1"
model_deployment_monitoring_job {
  display_name: "temp_e...     user_emails: ""
    }
    enable_logging: true
  }
  sample_predict_instance {
    null_value: NULL_VALUE
  }
}
,)
kwargs = {'metadata': [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')], 'timeout': 3600}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:


      return callable_(*args, **kwargs)


.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:67:

self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7fa9c5d00550>

request = parent: "projects/ucaip-sample-tests/locations/us-central1"

model_deployment_monitoring_job {

display_name: "temp_e2...

user_emails: ""

}

enable_logging: true

}

sample_predict_instance {

null_value: NULL_VALUE

}

}
timeout = 3600

metadata = [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')]

credentials = None, wait_for_ready = None, compression = None
def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)


  return _end_unary_response_blocking(state, call, False, None)


.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:946:

state = <grpc._channel._RPCState object at 0x7fa9c5d00670>

call = <grpc._cython.cygrpc.SegregatedCall object at 0x7fa9c73dd300>

with_call = False, deadline = None
def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:


      raise _InactiveRpcError(state)


E           grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

E           	status = StatusCode.RESOURCE_EXHAUSTED

E           	details = "received initial metadata size exceeds limit"

E           	debug_error_string = "{"created":"@1663315347.076605531","description":"Error received from peer ipv4:172.253.117.95:443","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"

E           >
.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:849: _InactiveRpcError
The above exception was the direct cause of the following exception:
self = <tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring object at 0x7fa9cdbbe310>
def test_mdm_two_models_two_valid_configs(self):
    [deployed_model1, deployed_model2] = list(
        map(lambda x: x.id, self.endpoint.list_models())
    )
    all_configs = {
        deployed_model1: objective_config,
        deployed_model2: objective_config2,
    }
    job = None


  job = aiplatform.ModelDeploymentMonitoringJob.create(


        display_name=self._make_display_name(key=JOB_NAME),
        logging_sampling_strategy=sampling_strategy,
        schedule_config=schedule_config,
        alert_config=alert_config,
        objective_configs=all_configs,
        create_request_timeout=3600,
        project=e2e_base._PROJECT,
        location=e2e_base._LOCATION,
        endpoint=self.endpoint,
        predict_instance_schema_uri="",
        analysis_instance_schema_uri="",
    )

tests/system/aiplatform/test_model_monitoring.py:185:

google/cloud/aiplatform/jobs.py:2355: in create

self._gca_resource = self.api_client.create_model_deployment_monitoring_job(

google/cloud/aiplatform_v1/services/job_service/client.py:3111: in create_model_deployment_monitoring_job

response = rpc(

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py:145: in call

return wrapped_func(*args, **kwargs)

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/timeout.py:102: in func_with_timeout

return func(*args, **kwargs)

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:69: in error_remapped_callable

six.raise_from(exceptions.from_grpc_error(exc), exc)

value = None

from_value = <_InactiveRpcError of RPC that terminated with:

status = StatusCode.RESOURCE_EXHAUSTED

details = "received initial m.../lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"



???

E   google.api_core.exceptions.ResourceExhausted: 429 received initial metadata size exceeds limit

:3: ResourceExhausted

The text was updated successfully, but these errors were encountered:

flaky-bot · 2022-09-17T02:55:08Z

commit: 9a506ee
buildURL: Build Status, Sponge
status: failed

Test output

args = (parent: "projects/ucaip-sample-tests/locations/us-central1"
model_deployment_monitoring_job {
  display_name: "temp_e...     user_emails: ""
    }
    enable_logging: true
  }
  sample_predict_instance {
    null_value: NULL_VALUE
  }
}
,)
kwargs = {'metadata': [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')], 'timeout': 3600}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:


      return callable_(*args, **kwargs)


.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:67:

self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7f2fbe0294c0>

request = parent: "projects/ucaip-sample-tests/locations/us-central1"

model_deployment_monitoring_job {

display_name: "temp_e2...

user_emails: ""

}

enable_logging: true

}

sample_predict_instance {

null_value: NULL_VALUE

}

}
timeout = 3600

metadata = [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')]

credentials = None, wait_for_ready = None, compression = None
def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)


  return _end_unary_response_blocking(state, call, False, None)


.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:946:

state = <grpc._channel._RPCState object at 0x7f2fbe0291f0>

call = <grpc._cython.cygrpc.SegregatedCall object at 0x7f2fc5b5e140>

with_call = False, deadline = None
def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:


      raise _InactiveRpcError(state)


E           grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

E           	status = StatusCode.RESOURCE_EXHAUSTED

E           	details = "received initial metadata size exceeds limit"

E           	debug_error_string = "{"created":"@1663370492.925193145","description":"Error received from peer ipv4:74.125.20.95:443","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"

E           >
.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:849: _InactiveRpcError
The above exception was the direct cause of the following exception:
self = <tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring object at 0x7f2fc5d67310>
def test_mdm_two_models_two_valid_configs(self):
    [deployed_model1, deployed_model2] = list(
        map(lambda x: x.id, self.endpoint.list_models())
    )
    all_configs = {
        deployed_model1: objective_config,
        deployed_model2: objective_config2,
    }
    job = None


  job = aiplatform.ModelDeploymentMonitoringJob.create(


        display_name=self._make_display_name(key=JOB_NAME),
        logging_sampling_strategy=sampling_strategy,
        schedule_config=schedule_config,
        alert_config=alert_config,
        objective_configs=all_configs,
        create_request_timeout=3600,
        project=e2e_base._PROJECT,
        location=e2e_base._LOCATION,
        endpoint=self.endpoint,
        predict_instance_schema_uri="",
        analysis_instance_schema_uri="",
    )

tests/system/aiplatform/test_model_monitoring.py:185:

google/cloud/aiplatform/jobs.py:2355: in create

self._gca_resource = self.api_client.create_model_deployment_monitoring_job(

google/cloud/aiplatform_v1/services/job_service/client.py:3111: in create_model_deployment_monitoring_job

response = rpc(

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py:145: in call

return wrapped_func(*args, **kwargs)

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/timeout.py:102: in func_with_timeout

return func(*args, **kwargs)

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:69: in error_remapped_callable

six.raise_from(exceptions.from_grpc_error(exc), exc)

value = None

from_value = <_InactiveRpcError of RPC that terminated with:

status = StatusCode.RESOURCE_EXHAUSTED

details = "received initial m.../lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"



???

E   google.api_core.exceptions.ResourceExhausted: 429 received initial metadata size exceeds limit

:3: ResourceExhausted

rosiezou · 2022-09-20T00:49:44Z

This failed due to resource exhaustion. I have manually removed some batch prediction jobs that haven't been deleted after sample code snippets finished testing, and it fixed the issue. Will keep an eye out for future resource exhaustions and file a bug if necessary.

flaky-bot · 2022-09-20T11:02:56Z

Looks like this issue is flaky. 😟

I'm going to leave this open and stop commenting.

A human should fix and close this.

commit: 9a506ee
buildURL: Build Status, Sponge
status: failed

Test output

args = (parent: "projects/ucaip-sample-tests/locations/us-central1"
model_deployment_monitoring_job {
  display_name: "temp_e...     user_emails: ""
    }
    enable_logging: true
  }
  sample_predict_instance {
    null_value: NULL_VALUE
  }
}
,)
kwargs = {'metadata': [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')], 'timeout': 3600}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:


      return callable_(*args, **kwargs)


.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:67:

self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7f28488061f0>

request = parent: "projects/ucaip-sample-tests/locations/us-central1"

model_deployment_monitoring_job {

display_name: "temp_e2...

user_emails: ""

}

enable_logging: true

}

sample_predict_instance {

null_value: NULL_VALUE

}

}
timeout = 3600

metadata = [('x-goog-request-params', 'parent=projects/ucaip-sample-tests/locations/us-central1'), ('x-goog-api-client', 'model-builder/1.17.1 gl-python/3.8.13 grpc/1.47.0 gax/1.32.0 gapic/1.17.1')]

credentials = None, wait_for_ready = None, compression = None
def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)


  return _end_unary_response_blocking(state, call, False, None)


.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:946:

state = <grpc._channel._RPCState object at 0x7f28489e5580>

call = <grpc._cython.cygrpc.SegregatedCall object at 0x7f284a1f9100>

with_call = False, deadline = None
def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:


      raise _InactiveRpcError(state)


E           grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

E           	status = StatusCode.RESOURCE_EXHAUSTED

E           	details = "received initial metadata size exceeds limit"

E           	debug_error_string = "{"created":"@1663658455.987226763","description":"Error received from peer ipv4:74.125.199.95:443","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"

E           >
.nox/system-3-8/lib/python3.8/site-packages/grpc/_channel.py:849: _InactiveRpcError
The above exception was the direct cause of the following exception:
self = <tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring object at 0x7f2850846be0>
def test_mdm_two_models_two_valid_configs(self):
    [deployed_model1, deployed_model2] = list(
        map(lambda x: x.id, self.endpoint.list_models())
    )
    all_configs = {
        deployed_model1: objective_config,
        deployed_model2: objective_config2,
    }
    job = None


  job = aiplatform.ModelDeploymentMonitoringJob.create(


        display_name=self._make_display_name(key=JOB_NAME),
        logging_sampling_strategy=sampling_strategy,
        schedule_config=schedule_config,
        alert_config=alert_config,
        objective_configs=all_configs,
        create_request_timeout=3600,
        project=e2e_base._PROJECT,
        location=e2e_base._LOCATION,
        endpoint=self.endpoint,
        predict_instance_schema_uri="",
        analysis_instance_schema_uri="",
    )

tests/system/aiplatform/test_model_monitoring.py:185:

google/cloud/aiplatform/jobs.py:2355: in create

self._gca_resource = self.api_client.create_model_deployment_monitoring_job(

google/cloud/aiplatform_v1/services/job_service/client.py:3111: in create_model_deployment_monitoring_job

response = rpc(

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py:145: in call

return wrapped_func(*args, **kwargs)

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/timeout.py:102: in func_with_timeout

return func(*args, **kwargs)

.nox/system-3-8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py:69: in error_remapped_callable

six.raise_from(exceptions.from_grpc_error(exc), exc)

value = None

from_value = <_InactiveRpcError of RPC that terminated with:

status = StatusCode.RESOURCE_EXHAUSTED

details = "received initial m.../lib/surface/call.cc","file_line":966,"grpc_message":"received initial metadata size exceeds limit","grpc_status":8}"



???

E   google.api_core.exceptions.ResourceExhausted: 429 received initial metadata size exceeds limit

:3: ResourceExhausted

rosiezou · 2022-09-29T22:36:39Z

This is fixed in #1671

product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Sep 16, 2022

rosiezou closed this as completed Sep 20, 2022

flaky-bot bot reopened this Sep 20, 2022

flaky-bot bot added the flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. label Sep 20, 2022

rosiezou self-assigned this Sep 22, 2022

meredithslota added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Sep 29, 2022

rosiezou closed this as completed Sep 29, 2022

flaky-bot bot mentioned this issue Nov 18, 2022

tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring: test_mdm_two_models_two_valid_configs failed #1804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring: test_mdm_two_models_two_valid_configs failed #1673

tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring: test_mdm_two_models_two_valid_configs failed #1673

flaky-bot bot commented Sep 16, 2022

flaky-bot bot commented Sep 17, 2022

rosiezou commented Sep 20, 2022 •

edited

Loading

flaky-bot bot commented Sep 20, 2022

rosiezou commented Sep 29, 2022

tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring: test_mdm_two_models_two_valid_configs failed #1673

tests.system.aiplatform.test_model_monitoring.TestModelDeploymentMonitoring: test_mdm_two_models_two_valid_configs failed #1673

Comments

flaky-bot bot commented Sep 16, 2022

flaky-bot bot commented Sep 17, 2022

rosiezou commented Sep 20, 2022 • edited Loading

flaky-bot bot commented Sep 20, 2022

rosiezou commented Sep 29, 2022

rosiezou commented Sep 20, 2022 •

edited

Loading