Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readability for citation key patterns #6706

Merged
merged 45 commits into from
Sep 21, 2020

Conversation

k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 commented Jul 23, 2020

Fixes #3920.

The issue is that AuthorList#fixAuthorForAlphabetization is used to create a String representation of authors' names. This can leave unbalanced brackets if a name piece starts with a latex command, as it is not the intended use of the method.

The suggested fix is to resolve latex in the name of an author and then remove any illegal characters in the post-processing (which is dependent on if it is going to be used as a file path or citation key). As Unicode characters are already allowed in the name of an author, this change should not affect users too much. It will, however,

  • have slower performance, as all authors are being processed as if they contain latex
  • in almost all cases that contain latex, it will produce different output than the previous solution

As always, these are only suggestions. Any comments and criticism are encouraged!
The "intuition" behind the suggestion is that latex in Strings will always be hard to deal with, and I think most users expect a pattern to resolve to something as similar as possible to what is shown in the main table.

What remains

  • verify that the CitationKeyGenerator does not output Unicode keys
  • remove the latex_to_unicode modifier as it should not be used
  • change access modifiers for methods in BracketedPattern, I'd suggest that most should be private or protected
  • change how arguments are parsed for auth.../ed... patterns (readability)
  • add test cases for this issue
  • test for character classes in regexp
  • Fix parsing of bracketed patterns
  • Fix the conversion of unicode to characters allowed in bibtex keys in CitationKeyGenerator
  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 changed the title Fix for issue 3920 [WIP] Fix for issue 3920 Jul 23, 2020
@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 marked this pull request as draft July 23, 2020 16:29
String[] changed to List<String> and some refactoring
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on it! To ease reviewing could you add @Test cases for the issue you fixed? By quickly looking at it, I see many lines changed, but no test cases re-activated or added.

For your discussion of the consequences, maybe an ADR could be the proper format.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member Author

A test case for this particular issue (where starting brackets can cause a problem) will be added! It is just taking a bit of time since I haven't been able to spend as much time on this as I would have wanted 🙁
I'd argue that it is motivated to make several methods in BracketedPattern private, but then test cases in CitationKeyGeneratorTest depends on them being public -> a lot of refactoring.

For your discussion of the consequences, maybe an ADR could be the proper format.

I should have written most of the pull request in ADR format?

The tests check if authors with names containing Unicode returns the
correct number of characters in the citation key.
The result of expanding a bracket must be cleaned before applying
modifiers that depends on the number of characters in the result.
(truncate)
This fix extends a previous workaround in BracketedPattern.authN. It
does not solve the underlying issue.
# Conflicts:
#	src/test/java/org/jabref/logic/citationkeypattern/BracketedPatternTest.java
#	src/test/java/org/jabref/logic/citationkeypattern/CitationKeyGeneratorTest.java
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the long list of changes, I would suggest to merge this in - and then work on the two remaining open points. - Otherwise, it will get harder and harder to review.

Nevertheless, a short nitpick on the logger.

(Sorry for review delay)

@koppor koppor added the status: changes required Pull requests that are not yet complete label Sep 20, 2020
@koppor
Copy link
Member

koppor commented Sep 20, 2020

Is it possible to fix checkstyle? This does not look good:

/home/runner/work/jabref/jabref/src/test/java/org/jabref/logic/citationkeypattern/CitationKeyGeneratorTest.java:9: error: package org.jabref.model.bibtexkeypattern does not exist

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 changed the title [WIP] Fix for issue 3920 [WIP] Fix parsing and unicode in citation keys Sep 21, 2020
@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 changed the title [WIP] Fix parsing and unicode in citation keys [WIP] Readability for citation key patterns Sep 21, 2020
Co-authored-by: Oliver Kopp <[email protected]>
Co-authored-by: Oliver Kopp <[email protected]>
@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member Author

k3KAW8Pnf7mkmdSMPHz27 commented Sep 21, 2020

Seeing the long list of changes, I would suggest to merge this in - and then work on the two remaining open points. - Otherwise, it will get harder and harder to review.

Sorry about that. I didn't really expect to go down this route.

After the merge I'll open up a PR for issue #6892 dealing with parsing.
<edit>
I realized this PR still closes issue #3920. The problem with converting unicode characters is not in the issue tracking system. I can open up a PR for it after #6892 is fixed, if that sounds reasonable?
</edit>

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 changed the title [WIP] Readability for citation key patterns Readability for citation key patterns Sep 21, 2020
@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 marked this pull request as ready for review September 21, 2020 17:26
@koppor koppor merged commit efdd24c into JabRef:master Sep 21, 2020
@koppor
Copy link
Member

koppor commented Sep 21, 2020

Seeing the long list of changes, I would suggest to merge this in - and then work on the two remaining open points. - Otherwise, it will get harder and harder to review.
Sorry about that. I didn't really expect to go down this route.

No need to be sorry. I have to thank you that you keep working on the PR even if much time is spend on the code quality.

After the merge I'll open up a PR for issue #6892 dealing with parsing.

Year, thank you for that!

I realized this PR still closes issue #3920. The problem with converting unicode characters is not in the issue tracking system. I can open up a PR for it after #6892 is fixed, if that sounds reasonable?

Sure, step by step.

Siedlerchr added a commit that referenced this pull request Sep 26, 2020
…-issue-6707

* upstream/master: (35 commits)
  Fix a fetcher test for the ShortDOIService (#6945)
  Fixes Shared Database: Changes filtering in CoarseChangeFilter to attribute property (#6868)
  Changed default value of "search and store files relative to bibtex file" to true (#6928)
  Replace comment by just a failure (#6943)
  Fix: in entry types editor selected field is not removed after first click  (#6941)
  Fix remove actions for entry types in the editor (#6933)
  Always use Java 15 (#6929)
  Update DevDocs: workaround for issues with local openjfx maven libraries (#6931)
  Fixes bugs in the `regex` cite key pattern modifier (#6893)
  Add missing author
  Readability for citation key patterns (#6706)
  Add new author
  Reset to master and add default case to switch (#6847)
  Bump mockito-core from 3.5.10 to 3.5.11 (#6924)
  Bump byte-buddy-parent from 1.10.14 to 1.10.15 (#6923)
  Bump org.beryx.jlink from 2.21.4 to 2.22.0 (#6925)
  Bump xmpbox from 2.0.20 to 2.0.21 (#6926)
  Bump pascalgn/automerge-action from v0.9.0 to v0.10.0 (#6927)
  Improve parsing of short DOIs (#6920)
  Bump junit-vintage-engine from 5.6.2 to 5.7.0 (#6910)
  ...
Siedlerchr added a commit that referenced this pull request Sep 26, 2020
* upstream/master: (55 commits)
  Rename menus citation style in preview style (#6899)
  Fix for some Unicode characters in citation keys (#6938)
  Add missing authors
  Fix a fetcher test for the ShortDOIService (#6945)
  Fixes Shared Database: Changes filtering in CoarseChangeFilter to attribute property (#6868)
  Changed default value of "search and store files relative to bibtex file" to true (#6928)
  Replace comment by just a failure (#6943)
  Fix: in entry types editor selected field is not removed after first click  (#6941)
  Fix remove actions for entry types in the editor (#6933)
  Always use Java 15 (#6929)
  Update DevDocs: workaround for issues with local openjfx maven libraries (#6931)
  Fixes bugs in the `regex` cite key pattern modifier (#6893)
  Add missing author
  Readability for citation key patterns (#6706)
  Add new author
  Reset to master and add default case to switch (#6847)
  Bump mockito-core from 3.5.10 to 3.5.11 (#6924)
  Bump byte-buddy-parent from 1.10.14 to 1.10.15 (#6923)
  Bump org.beryx.jlink from 2.21.4 to 2.22.0 (#6925)
  Bump xmpbox from 2.0.20 to 2.0.21 (#6926)
  ...

# Conflicts:
#	src/main/java/org/jabref/logic/util/DelayTaskThrottler.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: changes required Pull requests that are not yet complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

latex_to_unicode produces problematic filename
3 participants