Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add a test for metadata #331

Conversation

THOR300
Copy link
Contributor

@THOR300 THOR300 commented Sep 17, 2024

Description

Adding a test for metadata filtering.


WIP failing test:

So we have metadata from vespa but then we're also querying the db for metadata. In these tests we are getting a disparity between the metadata in vespa and that in the db. Thus, the search results when using a metadata filter aren't correct. I.e. filtering for sector: Price returns results without that metadata being present.

_populate_db_families is non-deterministic as well.

vespa_families[0].hits[0].metadata -> [{'name': 'sector', 'value': 'Price'}...

>> family_and_family_metadata[0][1].value['sector'] ['Agriculture', 'Urban', 'LULUCF', 'Cross Cutting Area', 'Finance', 'Social development'...]

>>> db_family_tuple[1].value['sector'] ['Transport', 'Economy-wide', 'Disaster Risk Management (Drm)'.... Not sector: ['Price']

In theory this shouldn't happen in practice due to the db_state being from the db but is there a possibility we update metadata and before vespa is updated start getting some odd search results?

Validating this theory that there's a disparity.
Ran the following yql and proved that the vespa search results are correct against the db.
vespa query 'select * from family_document where family_import_id contains "CCLW.family.8633.0"
Validated postgres by adding a breakpoint and adding running the sql alchemy query.


Proposed version

Please select the option below that is most relevant from the list below. This
will be used to generate the next tag version name during auto-tagging.

  • Skip auto-tagging
  • Patch
  • Minor version
  • Major version

Visit the Semver website to understand the
difference between MAJOR, MINOR, and PATCH versions.

Notes:

  • If none of these options are selected, auto-tagging will fail
  • Where multiple options are selected, the most senior option ticked will be
    used -- e.g. Major > Minor > Patch
  • If you are selecting the version in the list above using the textbox, make
    sure your selected option is marked [x] with no spaces in between the
    brackets and the x

Type of change

Please select the option(s) below that are most relevant:

  • Bug fix
  • New feature
  • Breaking change
  • GitHub workflow update
  • Documentation update
  • Refactor legacy code
  • Dependency update

How Has This Been Tested?

Please describe the tests that you added to verify your changes.

Reviewer Checklist

  • DB_CLIENT DEPENDENCY IS ON THE LATEST VERSION
  • The PR represents a single feature (small driveby fixes are also ok)
  • The PR includes tests that are sufficient for the level of risk
  • The code is sufficiently commented, particularly in hard-to-understand areas
  • Any required documentation updates have been made
  • Any TODOs added are captured in future tickets
  • No FIXMEs remain

@THOR300 THOR300 changed the base branch from main to feature/pla-146-surface-sdk-changes-to-backend September 17, 2024 10:20
@THOR300 THOR300 marked this pull request as ready for review September 17, 2024 11:47
@THOR300 THOR300 requested a review from a team as a code owner September 17, 2024 11:47
@THOR300 THOR300 merged commit 6f0ad3e into feature/pla-146-surface-sdk-changes-to-backend Sep 17, 2024
@THOR300 THOR300 deleted the feature/add-a-test-for-metadata branch September 17, 2024 11:47
THOR300 added a commit that referenced this pull request Sep 17, 2024
* Updating the cpr_sdk version.

* Updating the version in the pyproject.toml.

* Adding todo comments.

* Pushing working changes.

* Pushing working changes.

* Updating vespa search params test.

* Correcting the browse test fixtures.

* Updating convert filters test.

* Removing todo.

* Updating browse functionality to return geographies.

* updating the search tests.

* Adding a WIP commit.

* Attempting trunk fix.

* Down to eight failures with this reversion.

* Bugfix for the test_no_doc_if_in_postgres_but_not_vespa test.

* Bumping the cpr_sdk version.

* Cleaning up.

* Removing from test.

* Adding metadata and corpus checks to the tests.

* Feature/add a test for metadata (#331)

* Working commit.

* Clean up.

* Making populate db families deterministic.

* Adding distinct deterministic and random fixtures for family metadata.

---------

Co-authored-by: Mark <[email protected]>

* Trunk fix.

* Feature/resolve merge conflict v2 (#340)

* Move vespa search tests & the search_fixtures they use under dedicated sub-folder  (#334)

* Move vespa search tests under dedicated vespa folder

* Move /search_fixtures under vespa search folder & rename to fixtures

* Bump to 1.14.20

* Move vespa search result order tests into a separate file (#335)

* Move vespa search tests under dedicated vespa folder

* Move /search_fixtures under vespa search folder & rename to fixtures

* Bump to 1.14.20

* Move vespa search result order tests to separate file

* Bump to 1.14.19

* Move continuation token vespa search tests to separate file (#336)

* Move vespa search tests under dedicated vespa folder

* Move /search_fixtures under vespa search folder & rename to fixtures

* Bump to 1.14.20

* Move vespa search result order tests to separate file

* Bump to 1.14.19

* Move vespa search continuation token tests to separate file

* Group pagination and continuation token tests

* Move keyword and range vespa search tests into separate file (#337)

* Move vespa search tests under dedicated vespa folder

* Move /search_fixtures under vespa search folder & rename to fixtures

* Bump to 1.14.20

* Move vespa search result order tests to separate file

* Bump to 1.14.19

* Move vespa search continuation token tests to separate file

* Move keyword and range vespa search tests into separate file

* Delete test_vespa_search_cont_tokens.py

* Move _make_search_request into vespa search setup

* Move vespa search tests for ignoring special chars & case to separate file (#338)

* Move data download tests into parent folder

* Move query insensitivity & special chars ignoring tests out

* Rename from test_vespasearch

* Bump to 1.14.20

* Removing refactored file.

* Adding back in the changes from the test_vespasearch.

---------

Co-authored-by: Katy Baulch <[email protected]>
Co-authored-by: Mark <[email protected]>

---------

Co-authored-by: Mark <[email protected]>
Co-authored-by: Katy Baulch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant