Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvest: Provide more documentation and examples on how to define common harvest sets #3262

Closed
kcondon opened this issue Aug 11, 2016 · 10 comments
Assignees
Labels
Feature: Harvesting Type: Suggestion an idea UX & UI: Design This issue needs input on the design of the UI and from the product owner

Comments

@kcondon
Copy link
Contributor

kcondon commented Aug 11, 2016

Based on prior experience and recent UX comments from Odum, there should be both general and specific info on syntax used to define harvest sets. General being we support the syntax used by our local search syntax used by Solr that can be discovered searching with Advanced Search and viewing the resulting syntax displayed in the search box on the results page. Specific being commonly used sets from experience: by persistent identifier namespace, by dataverse, by a field of interest such as keyword, and by specific datasets. Related to recreating our pre-4.0 harvest sets.

@kcondon kcondon added UX & UI: Design This issue needs input on the design of the UI and from the product owner Type: Suggestion an idea Feature: Harvesting Priority 2: Moderate labels Aug 11, 2016
@djbrooke
Copy link
Contributor

Hey Kevin - something like this (stolen from 3.6 documentation 😄)?

Generally speaking, basic queries take the form of study metadata field:value. Examples include:
globalId:"hdl 1902 1 10684" OR globalId:"hdl 1902 1 11155": Include studies with global idshdl:1902.1/10684 and hdl:1902.1/11155
authority:1902.2: Include studies whose authority is 1902.2. Different authorities usually represent different sources such as IQSS, ICPSR, etc.
dvOwnerId:184: Include all studies belonging to dataverse with database id 184
studyNoteType:"DATAPASS": Include all studies that were tagged with or include the text DATAPASS in their study note field.

Let me know if this is reasonable and I'll roll it in with the other documentation updates.

@kcondon
Copy link
Contributor Author

kcondon commented Aug 12, 2016

The basic content and concepts are good, the actual query syntax has changed.

The query examples from your example seem to be:

  1. specific datasets (we no longer say study) by global id (we may say persistent identifier now and field name may be persistentId)
  2. specifying an entire globalid authority
  3. specifying datasets by dataverse
  4. specifying datasets that have a common field/ keyword

A further note might say that the syntax can be discovered in part by crafting an advanced search of what you want to include in your set, then using the syntax show in the result.

A limitation to the advanced search approach is it does not list all field names. We do have the information somewhere but do not think it is in the help.


From: Danny Brooke [email protected]
Sent: Friday, August 12, 2016 9:53:34 AM
To: IQSS/dataverse
Cc: Condon, Kevin M; Author
Subject: Re: [IQSS/dataverse] Harvest: Provide more documentation and examples on how to define common harvest sets (#3262)

Hey Kevin - something like this (stolen from 3.6 documentation ?)?

Generally speaking, basic queries take the form of study metadata field:value. Examples include:
globalId:"hdl 1902 1 10684" OR globalId:"hdl 1902 1 11155": Include studies with global idshdl:1902.1/10684 and hdl:1902.1/11155
authority:1902.2: Include studies whose authority is 1902.2. Different authorities usually represent different sources such as IQSS, ICPSR, etc.
dvOwnerId:184: Include all studies belonging to dataverse with database id 184
studyNoteType:"DATAPASS": Include all studies that were tagged with or include the text DATAPASS in their study note field.

Let me know if this is reasonable and I'll roll it in with the other documentation updates.

You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_3262-23issuecomment-2D239452043&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=QDKgt7GRfK1-UWNBbIzSErHt2OT7R-95aP2GHAo6dbw&s=NBGi5UYtnrZ8KBEDZLIop7P1qkuNI14a7g44DNEZhoo&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AEwMCExaM0cdiqv3ur-2DqnPHi-2DA6VnJ94ks5qfHregaJpZM4JiRuZ&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=TUpjWt9sVfaAC8ETCY_cDPtqJKl7s242PLg6-Wx6UpM&m=QDKgt7GRfK1-UWNBbIzSErHt2OT7R-95aP2GHAo6dbw&s=fwKUlGoXuwm6nKJxbcuES1S9ohEUoPjtU49h4hIpoiE&e=.

@djbrooke djbrooke assigned djbrooke and unassigned kcondon Aug 12, 2016
@djbrooke
Copy link
Contributor

Got it - thanks!

@landreev
Copy link
Contributor

I have a (short-ish) section on this in the guide ("Managing Harvesting Server and Sets", guides/admin/harvestserver.html).
I have a few simple examples there, but have more detailed explanations on how they work ("more detailed" than the inline help text that you see on the page). I also have a good practical advice on how to discover fields and learn to use them to cook up working queries - by experimenting with the Advanced Search page. Also mention the need to use the 4.5 or later solr search schema, in order to be able to define sets.

Please review.

@landreev
Copy link
Contributor

I have a (short-ish) section on this in the guide ("Managing Harvesting Server and Sets", guides/admin/harvestserver.html).
I have a few simple examples there, but have more detailed explanations on how they work ("more detailed" than the inline help text that you see on the page). I also have a good practical advice on how to discover fields and learn to use them to cook up working queries - by experimenting with the Advanced Search page. Also mention the need to use the 4.5 or later solr search schema, in order to be able to define sets.

Please review.

@djbrooke djbrooke assigned pdurbin and unassigned kcondon and djbrooke Aug 15, 2016
@djbrooke
Copy link
Contributor

Hey @pdurbin - I think @landreev was referencing this issue in the meeting this morning. Is it OK if we work on this together so I can learn more about the query format? Thanks!

@djbrooke djbrooke self-assigned this Aug 15, 2016
@pdurbin
Copy link
Member

pdurbin commented Aug 15, 2016

@djbrooke yep! @landreev mentioned that we don't give any guidance on how to use the "q" parameter at http://guides.dataverse.org/en/4.4/api/search.html

I guess we could link to https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser since that's what http://lucene.apache.org/solr/quickstart.html links to. That is to say, it explains how to use the "q" parameter.

In addition to this link we could probably give some specific examples that would make more sense for someone searching Dataverse. Searches on title, author, etc.

This issue opened by @leeper is highly related: What are the allowed search fields for the Search API q parameter? #2558

@djbrooke
Copy link
Contributor

@pdurbin Cool - let's take 15 after standup tomorrow to finish this up and so that I can learn something! 👍

@djbrooke
Copy link
Contributor

OK - I left a few in there, modeled after what was there in 3.x.

@kcondon if these strings are correct, feel free to close this out. If they're not, let me know where I went astray and I'll get them fixed up.

@djbrooke djbrooke assigned kcondon and unassigned djbrooke Aug 17, 2016
@kcondon
Copy link
Contributor Author

kcondon commented Aug 17, 2016

Looks good, closing.

@kcondon kcondon closed this as completed Aug 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Harvesting Type: Suggestion an idea UX & UI: Design This issue needs input on the design of the UI and from the product owner
Projects
None yet
Development

No branches or pull requests

4 participants