-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr: make some internal fields indexed=true (searchable) for troubleshooting #2038
Comments
Requires schema change and re-indexing #2038 Also show output for orphaned files in index/status API call.
As I mentioned in #2086 I think the only way to solve #2086 is to make this field searchable:
While we're making more fields searchable, I would suggest also making these searchable because the values are short and shouldn't contain very many special characters (or at least the special characters will be predictable):
I'd suggest taking a "wait and see" approach on making the following searchable because the values are long and potentially tricky to search on given special characters and such:
@scolapasta I'm moving this to the current milestone since it's required for #2086. If you want to simply give me this ticket to make the change above, that's fine. |
Requires schema change and re-indexing #2038 Also show output for orphaned files in index/status API call.
@scolapasta in 96e411c I started making Feel free to move this ticket out of 4.0.1. There's no rush for the other fields. |
@scolapasta, this will be needed for the Data Related to Me page.
|
entityId is now searchable |
Right. This was merged from the "mydata" branch into the "4.0.2" branch the other day. |
@scolapasta should we use this issue for the idea of a new Solr field containing "4.2" or whatever Dataverse version was used to index the Solr document? I guess we could call it "indexedByDataverseVersion" or something (suggestions welcome). It sounds like we want it to be searchable for troubleshooting purposes which is what made me think of this issue. /cc @kcondon @landreev |
@scolapasta and I discussed my recommendations at #2038 (comment) and decided to go with them. That is to say that as of f14ab64 the following fields are searchable after reindexing:
Here's an example of searching by the MD5 of a file: Passing to QA. I'd suggest testing #2530 at the same time. |
@pdurbin
Passing back for comment. Otherwise, if the purpose is only for troubleshooting, then probably ok. |
1: md5
I suspect I'm doing something wrong but I can't reproduce this. When I search for "fileMd5:28bea8a0f1d3ceb96a1f2fe1f33c4bd2" it seems to find just the file I want: 2: size in bytes
Yeah, specifying range queries in Solr is unfriendly, as far as I know. Some day I'd like to work on #370 and make this easier from the GUI. For now you find the number of files over 123 MB in size with a search for "fileSizeInBytes:[123456789 TO *]" like this: Look for "range" at https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser for more on this topic. 3: UNF
When I query Solr directly it seems to work ( This makes me think, however, that we should probably be indexing UNF at the dataset level too, since I believe datasets get UNFs in their citation when they contain at least one file that has a UNF. (I'm not exactly sure how this works but @landreev probably knows.) Back to QA. I hope this helps. |
Thanks for following up Phil. |
Currently many "internal" fields are set to indexed=false in our Solr schema.xml because we weren't sure if we really want them to be searchable.
@scolapasta and I have been talking about how it might be useful to make more of them searchable for troubleshooting purposes.
Unfortunately, it's not enough to simply change the schema.xml file to indexed=true. You also have to reindex everything. So, we let's use this ticket to define which internal fields we want to be searchable. Here is a list of candidates:
For info on what each of these do: https://github.com/IQSS/dataverse/blob/master/src/main/java/edu/harvard/iq/dataverse/search/SearchFields.java
Note that we do not plan to copy any of these to the "catchall" field ("text") used by Basic Search. Nor do we intend to show these on the Advanced Search page. The idea is that if you know to type "entityId:123" you'll find both in the GUI and the Search API (all users, not just superusers).
Also note that when I say searchable I don't mean in a "friendly" way. These are "string" and "long" Solr field types so you have to provide an exact string match when searching.
I'm giving this to @scolapasta to indicate which fields should be searchable.
The text was updated successfully, but these errors were encountered: