Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added anykeyword and anyfield pseudo search fields #1876

Merged
merged 6 commits into from
Sep 23, 2016
Merged

Conversation

oscargus
Copy link
Contributor

@oscargus oscargus commented Aug 28, 2016

See #1633 (comment)

Will add tests etc if it is a good idea and works.

  • Change in CHANGELOG.md described
  • Tests created for changes
  • Screenshots added (for bigger UI changes)
  • Manually tested changed features in running JabRef
  • Check documentation status (Issue created for outdated help page at help.jabref.org?)

@oscargus oscargus added status: waiting-for-feedback The submitter or other users need to provide more information about the issue search labels Aug 28, 2016
// special case for searching a single keyword
if (fieldPattern.matcher("keyword").matches()) {
if (entry.hasField(FieldName.KEYWORDS)) {
List<String> keywords = new ArrayList<>(entry.getKeywords());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a simple stream on the list with anyMatch (returns a boolean)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you could even use that together with the prev line if you use your getFieldOptional method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getFieldOptional is no more. ;-)

You mean:
return entry.getKeywords().stream().anyMatch(this::matchFieldValue);
?

Could work. In the most recent version I swapped order of hasField and matcher as I expected the first one to be faster. Still, I think getKeywords() returns an empty list if the field is not set, so doesn't really need it.

@oscargus
Copy link
Contributor Author

I'm thinking that the distinction between keyword and keywords is not enough. What do you suggest? anykeyword? singlekeyword?

@oscargus
Copy link
Contributor Author

Also, when writing a bit more search queries for the tests: is allfields a good name? Seems like anyfield would be better. (And then anykeyword)

anyfield contains fruit and anykeyword matches apple

@Siedlerchr
Copy link
Member

Difficult question... To me anykeyword would be okay in this context and it's shorter than singlekeyword.
E.g thinking of anykeyword == "test" it has clear semantics

@Siedlerchr
Copy link
Member

Be aware of backwards compatibility. People may be used to "allFields" already..

@oscargus
Copy link
Contributor Author

allfields has been around for 8 hours so I doubt it has been that well established. ;-)

@oscargus oscargus added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Aug 28, 2016

for (String field : matchedFieldKeys) {
for (String field : fieldsKeys) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could here do the same as above, just with return fieldsKeys.stream().filter(this::matchFieldValue).isPresent()
The nice thing is that all operations like map., filter etc are only performed if the value inside the optional is present, so absolutely type safe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. This has to be wrapped in an if which executes return true;

@Siedlerchr
Copy link
Member

Overall lgtm, you could replace that one case with a stream too and maybe add some test regarding umlauts...(not sure if they are actually handled correctly)

@tobiasdiez
Copy link
Member

I find the anykeywords field a bit counter-intuitive. I would change the search syntax to search for the exact expression vs "contains":

  • keywords == test: matches only when the content is equal to test.
  • keywords <> test: matches if it contains test as a keyword. (or maybe a different symbol in place of <>)

@oscargus
Copy link
Contributor Author

@JabRef/developers Opinions?

Personally, I think that adding a search operator which only can be used for a specific field is a bit counter-intuitive.

@Siedlerchr
Copy link
Member

I like anykeywords better than a new operator. Especially the search query can be read as a whole sentence and is for the user directly recognizable and has a clear semantic, e.g.anykeyword contains test

For an operator I would expect it to work everywhere, not containing any special semantics for a specific field.

@oscargus oscargus changed the title Added keyword and allfields pseudo search fields Added anykeyword and anyfield pseudo search fields Aug 29, 2016
@tobiasdiez
Copy link
Member

But one could also have different fields with values separated by some character ("keyword fields"). For example, it would make sense to search for entries in specific groups or some user defined fields. But one probably doesn't want to define any*-pseudofields for all fields.

@oscargus
Copy link
Contributor Author

You do have a point, but:

  • there are quite few such fields at the moment
  • the separator discussion is ongoing and far from concluded, less
    implemented
  • group hierarchies/restoring need to be improved
  • and, also important in this context, I have no idea how to introduce a
    new operator

Hence, this has to be postponed and done by someone else. If it is likely
to happen we can just put this on hold and see. If it doesn't happen before
3.7 is released, I suggest we merge this in.

Clearly, it would also be much better if anyfield didn't have to exist.
but any term without a field specified does search in every field, even if
fields are specified elsewhere. Again, a hypothetical better solution
compared to an OK existing.

(Btw, I think that one can already search for groups, although it will be
in the "contains" way to make sense.)

@oscargus oscargus added this to the v3.7 milestone Aug 29, 2016
@tobiasdiez tobiasdiez added status: devcall and removed status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers status: waiting-for-feedback The submitter or other users need to provide more information about the issue labels Sep 11, 2016
@Braunch Braunch removed the stupro label Sep 11, 2016
@koppor
Copy link
Member

koppor commented Sep 13, 2016

Alternative: Use Apache Lucene's Syntax - see #1975

@lenhard
Copy link
Member

lenhard commented Sep 13, 2016

We discussed the following at the devcall: The search parser should be extended to support an any* syntax, e.g. it checks if a search term starts with any, if so it cuts of the any part and tries to use the remaining part as a field name. In this fashion, we would have support for the any-functionality for arbitrary fields, which sounds quite cool. Effortwise, it should just require some String manipulation in the query parser. @oscargus Would you be willing to implement this here?

The long term goal would be to move towards a sophisticated query parser, such as Lucene. However, the effort for this and the required time is quite unsure.

@oscargus
Copy link
Contributor Author

I do not get it. First, anyfield does not apply to the logic described and still needs to be treated separately (and this is really the main thing of this PR). Second, the whole point of anykeyword is that one can split the keywords field into smaller parts (and anykeywords sounds, ehh, strange...) and match one of these parts (exactly, as contains is more or less the same thing as before). Yes, it might work for e.g. authors but I cannot really see the point or how. In that case it may make more sense to be able to match the last name or first name or so, rather than the whole name. The key thing here is how to split the fields and that is field dependent (I can see a few fields where it might work: related, authors, file and there are probably some more. Most it won't make sense for.)

So, no, as I do not understand how it should be done (in a way that makes sense).

@koppor
Copy link
Member

koppor commented Sep 14, 2016 via email

@oscargus
Copy link
Contributor Author

Say you have two keywords aa and aaa. Currently you cannot search for aa without also finding aaa (assuming there is more than one keyword for each entry, so you need to search for keywords=aa). With this PR you can do anykeyword==aa and only get entries where one keyword matches aa or anykeyword!=aa to get entries without that keyword. (anykeyword=aa will get both aa and aaa, so in that case there is not difference to before.)

@koppor
Copy link
Member

koppor commented Sep 14, 2016

OK for me. Just resolve https://github.com/JabRef/jabref/pull/1876/files#r78694966 and then we can merge this in.

@tobiasdiez
Copy link
Member

@oscargus based on your description, the problem is not specific to the keywords fields. For example, I can't find the author aa without also finding aaa and the same for aa in the abstract field. This is why I suggested to use a new comparison operator which checks that the search string is exactly a "word" and not just a substring (and for the keywords field we maybe can be so intelligent and only take the keyword separator as word separator).

I think the standard solution is: only match whole words (i.e. aa only matches aa but not aaa) and use wildcard queries like aa* to match aa as well as aaa.

@Siedlerchr
Copy link
Member

👍 for the wildcard query option. e.g. * and ?

@oscargus
Copy link
Contributor Author

As I wrote there are a few more fields, yes, put providing a generic split
is more or less impossible (and, ,, ,). In addition there are more
fields that can't be split than the opposite. anyyear? anyissue?
anyabstract? For author, you more likely want to search for
lastname==aa than anyauthors=="a. aa" or anyauthors=="aa, a." (not to
mention you actually search for a specific author, not a specific
lastname), so adding something similar for names would make sense, but I do
not see that as anyauthors.

My suggestion is that we add any-support for the fields where it makes
sense and as this needs to be done specifically for each (class of) field
it can be done incrementally.

@koppor
Copy link
Member

koppor commented Sep 15, 2016

👍 - @oscargus please solve https://github.com/JabRef/jabref/pull/1876/files#r78694966, then we can merge and then we can work on other fields in another PR. I'm trying to keep the PRs small and want to merge them fast.

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can live with anykeyword but find it still a suboptimal/incomplete solution.

The anyfield is construction is something I don't understand. I thought JabRef searches all fields by default. Does fruity and keywords=apple work? If not then this is probably a bug and should be fixed. Could you please explain the purpose of anyfield.

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can live with anykeyword but find it still a suboptimal/incomplete solution.

The anyfield is construction is something I don't understand. I thought JabRef searches all fields by default. Does fruity and keywords=apple work? If not then this is probably a bug and should be fixed. Could you please explain the purpose of anyfield.

@tobiasdiez
Copy link
Member

The new github review is a bit strange... no idea why there are now two comments and one approval. It should be actually "one comment" 😄

@oscargus
Copy link
Contributor Author

Correct, fruity and keywords=appledoes not work, that was more or less the issue to start with. Feel free to propose a PR solving it in the proper way (I quote myself: "The allfields approach should be quite feasible to implement, although somehow it would be even better if not specifying a field leads to that behaviour (that I do not know how to do though).").

Incomplete, yes. But not really possible to generalize. Hence the need for more PRs improving the search, which, I guess, can always be improved.

# Conflicts:
#	CHANGELOG.md
#	src/test/java/net/sf/jabref/logic/search/SearchQueryTest.java
@koppor koppor merged commit 0b27086 into master Sep 23, 2016
@koppor koppor deleted the bettersearch branch September 23, 2016 07:35
zesaro pushed a commit to zesaro/jabref that referenced this pull request Nov 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants