Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/vocab guided query expansion #1544

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

kwahlin
Copy link
Contributor

@kwahlin kwahlin commented Dec 19, 2024

Depends on libris/definitions#508 and libris/lxlviewer#1194.

The main objective with this is to get rid of obviously bad hardcodings and make the whole query expansion machinery more consistent. Apart from some general cleanup/refactoring the main improvements are:

  • Finding the correct full path(s) for a property does no longer depend on OutsetType or DomainCategory. Instead we check for properties in the category :integral that are applicable to the queried type(s).
    Example query: type:Work yearPublished:x
    There are three :integral properties (:translationOf, :exactMatch, :hasInstance) applicable to :Work which, if we didn't know the domain of :yearPublished, would give us these alternative paths for :yearPublished:
    • translationOf.yearPublished
    • exactMatch.yearPublished
    • hasInstance.yearPublished
      We can however ignore both :translationOf and :exactMatch since we know that :yearPublished :domain :Instance and :Instance is not in range of :translationOf and :exactMatch, i.e. we won't find :yearPublished on any resource that these properties point to. So the only path that will be checked is hasInstance.yearPublished (@reverse.instanceOf.publication.year when fully expanded).
  • Both property expansion and field boosting depend on which type of resource is queried, however only the types within the same group are relevant. Before we just collected all types that appeared anywhere in the query tree. This change enables grouped queries with different types while still getting the property expansion and free text boosting right. E.g.
    (type:Work x) OR (type:Instance p1:v1) OR (type:Agent p2:v2)
    where field boosting for x is based on the :Work type and expansion of p1 and p2 is based on :Instance and :Agent respectively.
  • Redundant types within the same group are removed:
    public Node reduceTypes(Disambiguate disambiguate) {

    If for example the query is type:(Electronic Instance) then Instance is removed. Just like in the old search:
    * This also removes superclasses, since we only care about the most
  • Rely on marker :category :shorthand in definition for knowing which properties should be expanded via owl:propertyChainAxiom. a16fcb3
  • Don't inherit domain/range from propertyChainAxiom. Add these explicitly to short forms where needed instead, see Add explicit domain/range to some short forms definitions#508.
  • Search all subclasses by default for all types:
    private static Node buildTypeNode(Value type, Disambiguate disambiguate) {

    We only did this for Work and Instance before (hardcoded).
    Eventually we'll need to make this optional.
  • Using alias for instance/work type was such a bad and confusing solution that I decided to just bin it (d69a3b2). Mapping hasInstanceType to Format (Feature/update filter label mappings lxlviewer#1194) is enough to get the filter headers right at least when searching for works which is most important at this point. We can figure something out for instance search later.
  • The whole Disambiguate class was also a mess so I decided to give it a proper makeover. At least is less messy now.

Can't guarantee that this won't break anything. Planned to add more tests but ran out of time. Seems to work when I've tested myself.

@kwahlin kwahlin requested a review from olovy December 19, 2024 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant