Relevancy Experimentation Framework #421
Labels
🌟 goal: addition
Addition of new feature
🧭 project: thread
An issue used to track a project and its progress
🧱 stack: analytics
Related to the analytics setup
🧱 stack: api
Related to the Django API
🧱 stack: catalog
Related to the catalog and Airflow DAGs
🧱 stack: frontend
Related to the Nuxt frontend
🧱 stack: infra
Related to the Terraform config and other infrastructure
Summary
Develop an experimental framework for assessing how relevancy is affected (either positively or negatively) by changes made to our search algorithms and the data itself.
Description
The search relevancy sandbox project outlines the infrastructure pieces necessary for making rapid changes to our search algorithms and components. The next step for us in order to leverage this toolbox is to develop an experimentation framework for assessing how relevancy changes when certain changes are made. This could be done in two parts:
A hybrid automation + manual process which takes snapshots of specific result sets in staging, then prompts a maintainer to compare the relevancy between the two snapshots. This is intended to be a blunt approach; do the results still feel relevant is all that's necessary, if the results are wildly irrelevant where they previously were not, then we do not roll the projected change out to production.
An automated assessment of relevancy based on A/B experimentation and paired with the results from our analytics. This would use metrics like SELECT_SEARCH_RESULT, LOAD_MORE_RESULTS, and any new metrics we require to assess if users are clicking through results at a higher/lower rate given any changes.
The former would act as more of a "gut check" with little nuance, and the second would serve as a longer-running assessment of result relevancy based on actual user behavior.
Best guess at list of implementation plans:
Documents
Issues
Prior Art
The text was updated successfully, but these errors were encountered: