Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a types attribute to annotate() and candidates() #9

Open
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

aolieman
Copy link

Added a types attribute to annotate() and candidates(), which enables server-side filtering of resources. It also makes for a nice addition to the policy parameter.

I've tested it on both kinds of backends, but it only works properly with the Lucene-backed web service. This is, however, not a bug in pyspotlight and seems to be an unnoticed bug in Spotlight's statistical backend. It will be discussed in DBpS issue #251.

@aolieman
Copy link
Author

The problem of using a types filter with the statistical backend was an issue with missing documentation. Besides setting the types parameter, coreferenceResolution=false needs to be passed to the API in order for it to function. Because this behavior might change in the near future (see #251), I would not suggest to change the signature of annotate() solely to accommodate it.

But I would still like this to work asap ;-). My suggestion is to include all filter-related parameters in a filters attribute, which accepts a dictionary with any optional filters. I'm not sure if it's necessary, but I've included policy=whitelist as a default in the filter_kwargs dictionary that is included in the payload, to ensure that existing usage of pyspotlight is not disturbed.

Usage example:

only_person_filter = {
    'policy': "whitelist",
    'types': "DBpedia:Person",
    'coreferenceResolution': False
}

spotlight.annotate("http://localhost:2223/rest/annotate", 
                     "Komen Albert Verlinde en Metallica elkaar wel eens tegen in de showbizz?", 
                     filters=only_person_filter)
# [{u'similarityScore': 0.9999999700393123, u'surfaceForm': u'Albert Verlinde', u'support': 76, u'offset': 6, u'URI': u'http://nl.dbpedia.org/resource/Albert_Verlinde', u'percentageOfSecondRank': 0.0, u'types': u'DBpedia:Agent,Schema:Person,DBpedia:Http://xmlns.com/foaf/0.1/Person,DBpedia:Person,DBpedia:Presenter'}]

@originell
Copy link
Contributor

That sounds great :D Are you still using it this way ? =)

@aolieman
Copy link
Author

Yes, I am. By using a single filters argument, the signature of annotate and candidates only needs to change once. I think there are still plans to change the filter parameters in DBp Spotlight, but I'm not sure what the implementation status is.
Would you like to incorporate my changes into pyspotlight?

@aolieman
Copy link
Author

aolieman commented May 1, 2014

Hi @originell,
Adding the filters attribute is still relevant. Would you mind merging this pull request and updating on PyPI?
Or, if you are not interested in maintaining pyspotlight on PyPI, would you consider letting me submit new releases there for the time being? This is a nice wrapper and I would like to use it in many projects. In some cases, however, it is essential to use the version from PyPI.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants