Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorted metric tags to avoid duplicate prom data with gRPC requests #4006

Merged
merged 14 commits into from
May 14, 2022

Conversation

harshita-meena
Copy link
Contributor

@harshita-meena harshita-meena commented Mar 24, 2022

What this PR does / why we need it:
To ensure no duplication of metrics occur in prometheus due to non-unique temp keys dependent on "tags" of custom metrics.

Which issue(s) this PR fixes:

Fixes #3968

Special notes for your reviewer:

@seldondev
Copy link
Collaborator

Hi @harshita-meena. Thanks for your PR.

I'm waiting for a SeldonIO or todo member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

@harshita-meena harshita-meena changed the title Sorted dictionary tags Sorted dictionary tags to handle raw proto requests Mar 24, 2022
@ukclivecox
Copy link
Contributor

@harshita-meena Can you look at the linting and test errors?

@harshita-meena
Copy link
Contributor Author

@harshita-meena Can you look at the linting and test errors?

Yes sure..taking a look now

@axsaucedo
Copy link
Contributor

/test integration

@axsaucedo
Copy link
Contributor

/ok-to-test

@harshita-meena
Copy link
Contributor Author

/test this

@seldondev
Copy link
Collaborator

@harshita-meena: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test this

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

@axsaucedo
Copy link
Contributor

Btw just as a heads up @harshita-meena I had to add a dummy commit as it seems an issue in github was preventing the gh action pipelines to run, but that seems to have done the job

@axsaucedo
Copy link
Contributor

@harshita-meena it seems there are some failures on the tests, are you able to run these locally? You can do so with act command

@harshita-meena
Copy link
Contributor Author

@harshita-meena it seems there are some failures on the tests, are you able to run these locally? You can do so with act command

I was trying to follow the readme here but didn't get to complete the test, ill try your suggestion. thanks!

@harshita-meena
Copy link
Contributor Author

harshita-meena commented Mar 30, 2022

I ran the act command locally, but seems to be unrelated to my change, I ran again from main branch, which is failing as well. Do you think it is something related to how my env is setup ? is there a particular python version i should use

| ________________ ERROR collecting adserver/tests/test_server.py ________________
| ImportError while importing test module '/home/hmeena/development/test/seldon-core/components/alibi-detect-server/adserver/tests/test_server.py'.
| Hint: make sure your test modules/packages have valid Python names.
| Traceback:
| /opt/conda/lib/python3.7/importlib/__init__.py:127: in import_module
|     return _bootstrap._gcd_import(name[level:], package, level)
| adserver/tests/test_server.py:1: in <module>
|     from adserver.server import Protocol, CEServer, CEModel
| adserver/server.py:11: in <module>
|     from adserver.base import CEModel, ModelResponse
| adserver/base/__init__.py:2: in <module>
|     from .alibi_model import AlibiDetectModel
| adserver/base/alibi_model.py:4: in <module>
|     from alibi_detect.utils.saving import load_detector, Data
| /opt/conda/lib/python3.7/site-packages/alibi_detect/utils/saving.py:24: in <module>
|     from alibi_detect.od import (IForest, LLR, Mahalanobis, OutlierAE, OutlierAEGMM, OutlierProphet,
| /opt/conda/lib/python3.7/site-packages/alibi_detect/__init__.py:1: in <module>
|     from . import ad, cd, models, od, utils
| /opt/conda/lib/python3.7/site-packages/alibi_detect/od/__init__.py:10: in <module>
|     from .llr import LLR
| /opt/conda/lib/python3.7/site-packages/alibi_detect/od/llr.py:13: in <module>
|     from alibi_detect.utils.perturbation import mutate_categorical
| /opt/conda/lib/python3.7/site-packages/alibi_detect/utils/perturbation.py:5: in <module>
|     import cv2
| /opt/conda/lib/python3.7/site-packages/cv2/__init__.py:8: in <module>
|     from .cv2 import *
| E   ImportError: libGL.so.1: cannot open shared object file: No such file or directory
| =========================== short test summary info ============================
| ERROR adserver/tests/test_ad_model.py
| ERROR adserver/tests/test_cd_model.py
| ERROR adserver/tests/test_cm_model.py
| ERROR adserver/tests/test_od_model.py
| ERROR adserver/tests/test_server.py
| !!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!

@axsaucedo
Copy link
Contributor

@harshita-meena it seems the docker image that you are using to run these tests doesn't have the dependencies required for Python CV2 library - you may need to specify explicitly the base image you use to run, alternatively you will have to extend the image you are using to have the cv2 dependencies (ie RUN apt-get install ffmpeg libsm6 libxext6 -y).

What is the command that you are using to run these tests? Are you using the -P parameter to override the default image?

@harshita-meena
Copy link
Contributor Author

harshita-meena commented Mar 31, 2022

@harshita-meena it seems the docker image that you are using to run these tests doesn't have the dependencies required for Python CV2 library - you may need to specify explicitly the base image you use to run, alternatively you will have to extend the image you are using to have the cv2 dependencies (ie RUN apt-get install ffmpeg libsm6 libxext6 -y).

What is the command that you are using to run these tests? Are you using the -P parameter to override the default image?

I am just using simple act from the head of the repo, make sense what you said. Ill give it a shot.

@harshita-meena
Copy link
Contributor Author

I am guessing this is the image used by default.

@harshita-meena
Copy link
Contributor Author

@RafalSkolasinski Added the changes, is there a way I can trigger PR tests ? I did not quite figure out how to work it locally.

@RafalSkolasinski
Copy link
Contributor

/test integration

@RafalSkolasinski
Copy link
Contributor

RafalSkolasinski commented May 12, 2022

Just unit tests should be sufficient here, however, would you be able to add an unit tests for it?

Some model that returns for first request metrics from predict with tags {"a": "a", "b": "b"} and for second {"b": "b", "a": "a"}should be fine. Could be a modification to UserObject test case here https://github.com/SeldonIO/seldon-core/blob/master/python/tests/test_runtime_metrics_tags.py

@harshita-meena
Copy link
Contributor Author

Yups definitely, will add it, thanks for an example.

@seldondev
Copy link
Collaborator

@harshita-meena: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
integration b5b22c5 link /test integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository. I understand the commands that are listed here.

@seldondev seldondev added size/S and removed size/XS labels May 13, 2022
@harshita-meena
Copy link
Contributor Author

/test integration

@seldondev
Copy link
Collaborator

@harshita-meena: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

@RafalSkolasinski
Copy link
Contributor

/ok-to-test

@RafalSkolasinski
Copy link
Contributor

The python lint job seemed to fail, could you run fmt target from python/Makefile?

@harshita-meena
Copy link
Contributor Author

/ok-to-test

@harshita-meena harshita-meena changed the title Sorted dictionary tags to handle raw proto requests Sorted dictionary tags to avoid duplicate prom data with raw proto requests May 13, 2022
@harshita-meena harshita-meena changed the title Sorted dictionary tags to avoid duplicate prom data with raw proto requests Sorted dictionary tags to avoid duplicate prom data with gRPC requests May 13, 2022
@harshita-meena harshita-meena changed the title Sorted dictionary tags to avoid duplicate prom data with gRPC requests Sorted metric tags to avoid duplicate prom data with gRPC requests May 13, 2022
@harshita-meena
Copy link
Contributor Author

harshita-meena commented May 13, 2022

Great! I am pretty sure that now knowing the root of issue one could craft an example that also works wrongly for the predict / rest cases

I have tried these cases before as well but could not recreate it, maybe its still possible.
For REST the json data always ensured a sorted order, as for predict with gRPC, there was never a scenario where metrics were ingested from proto messages (a json "meta" object was converted to proto after metrics ingestion before sending gRPC response).

Either way, this small change should cover all such cases :)

@harshita-meena
Copy link
Contributor Author

The fmt command got stuck so I used black command first as suggested, now ran isort.

Copy link
Contributor

@RafalSkolasinski RafalSkolasinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for finding the issue and contributing the fix!

@seldondev
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RafalSkolasinski

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@seldondev seldondev merged commit e130e2c into SeldonIO:master May 14, 2022
@harshita-meena
Copy link
Contributor Author

@RafalSkolasinski Hi!
Wanted to know when can i expect the new patch version with this change ?

@RafalSkolasinski
Copy link
Contributor

Hi, it will come with Seldon Core 1.14 but I don't know the exact release schedule for it. @cliveseldon may know more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate metrics generated for GRPC requests hitting seldon predict_raw
6 participants