Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex of University Institution too broad for citekey generation #6942

Closed
1 task done
TheDom42 opened this issue Sep 24, 2020 · 4 comments · Fixed by #7210
Closed
1 task done

Regex of University Institution too broad for citekey generation #6942

TheDom42 opened this issue Sep 24, 2020 · 4 comments · Fixed by #7210

Comments

@TheDom42
Copy link

TheDom42 commented Sep 24, 2020

JabRef 5.2--2020-09-22--129c36e
Windows 10 10.0 amd64 
Java 15
  • Mandatory: I have tested the latest development version from http://builds.jabref.org/master/ and the problem persists

  • Steps to reproduce the behavior:

    • Add an entry where the author is an institution (other than an university) and contains a term beginning with uni (see below for example)
    • Enclose the institution in {}
    • Let Jabref generate the citekey as an institution with default settings
  • Expected Behavior
    Generate the citekey as a regular institution abbrevation

  • Observed Behavior
    citekey is generated as a "University" citekey beginning with Uni and adding a short form of the abbrevation afterwards

  • Alternative Behavior
    Make use of the shortauthor field if present for that entry. biblatex-apa for example uses this field as the institution abbrevation, so if wished, one can add the shortauthor field to the respective entrytypes.

Longer description:
I tried to add the following two entries and let Jabref generate the citekey automatically.

% Encoding: UTF-8

@Report{ICAO2013,
  author      = {{International Civil Aviation Organization}},
  date        = {2013},
  institution = {{International Civil Aviation Organization}},
  location    = {Montréal, Quebec},
  publisher   = {International Civil Aviation Organization},
  shortauthor = {ICAO},
  title       = {Foo},
  type        = {resreport},
}

@Report{UniEuropeanAviationSafetyAgency2019,
  author       = {{European Union Aviation Safety Agency}},
  date         = {2019-12-18},
  institution  = {{European Union Aviation Safety Agency}},
  title        = {Bar},
  type         = {resreport},
  shortauthor  = {EASA},
  organization = {{European Union Aviation Safety Agency}},
}
@Comment{jabref-meta: databaseType:biblatex;}

I did not understand why the automatically generation would work for the ICAO entry but would not work for the EASA field (I would have been okay with having EUASA as the citekey, as I was aware that the automatic abbrevation would have used the U in the initials of the name). Insted, a completely different key was generated.
I tried to pinpoint the issue and stumbled upon this Regex in the key generation for institutions in brackets.

private enum Institution {
SCHOOL,
DEPARTMENT,
UNIVERSITY,
TECHNOLOGY;
/**
* Matches "uni" at the start of a string or after a space, case insensitive
*/
private static final Pattern UNIVERSITIES = Pattern.compile("^uni.*", Pattern.CASE_INSENSITIVE);

To me, the Regex seems a bit broad but maybe this was on purpose. If so, I would be happy if there was an option to somehow have a setting to use the optional shortauthor field if present.
Unfortunately, I'm not skilled enough to implement a fix in a PR but I wanted to point out that this might be an issue.

If someone asks: I do not use the suggested institution abbrevation mentioned here as this messes with the institution abbrevation in the biblatex-apa package which uses the shortauthor. And even with an added abbrevation behind the full name, the citekey is still wrong.

@tmrd993
Copy link
Contributor

tmrd993 commented Oct 9, 2020

The regex matches anything that starts with uni. This regex would fix your example (?<![\w\d])(University|Uni)(?![\w\d]) but it only works for the two terms in the middle (university, uni) which means it would only work for institutions written in english. Using a hardcoded list with translations of university would work better in this case. We could parse the institution and check if the term university is present in any language. Maybe the devs have a better method of solving this?

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

k3KAW8Pnf7mkmdSMPHz27 commented Oct 9, 2020

I am by no means an expert on these issues, but my thoughts are,

  1. Regarding shortauthor, I don't have much of an opinion. Since the institution key, to the best of my understanding, is a heuristic/convenience feature, I don't see the drawback of using it.
  2. I'd agree that the regex is a bit on the broad side.

And even with an added abbrevation behind the full name, the citekey is still wrong.

Oupsie. That is my fault,

String lastName = author.getLast()
.map(LatexToUnicodeAdapter::format)
.map(isInstitution(author) ?
BracketedPattern::generateInstitutionKey : Function.identity())
.orElse(null);

the LatexToUnicodeAdapter will interpret the {(...)} as a code-block, which gets resolved to (...), which is likely what is going wrong, as it won't be detected as an abbreviation anymore.

We could parse the institution

I believe the name is being parsed around , and later through,

List<String> tokenParts = getValidInstitutionNameParts(institutionNameTokens[index]);

(?<![\w\d])(University|Uni)(?![\w\d])

I'd be hesitant against making it too specific as well. I don't really know what a good solution to this is. I'd consider ^univ.* a possible "solution" as well (unless there is an argument for only matching "Uni"?). Based on https://www.indifferentlanguages.com/words/university it matches in almost the same number of languages. I don't really have an opinion regarding having a hardcoded list.

@koppor koppor changed the title Regex of University Institution to broad for citekey generation Regex of University Institution too broad for citekey generation Oct 14, 2020
@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

@tmrd993 are you looking into this issue?

k3KAW8Pnf7mkmdSMPHz27 added a commit to k3KAW8Pnf7mkmdSMPHz27/user-documentation that referenced this issue Dec 21, 2020
Adds a note on generating institution keys. Uses the example given in JabRef/jabref#6942
tobiasdiez pushed a commit to JabRef/user-documentation that referenced this issue Dec 21, 2020
Adds a note on generating institution keys. Uses the example given in JabRef/jabref#6942
@tobiasdiez
Copy link
Member

Thanks to @k3KAW8Pnf7mkmdSMPHz27 this should be fixed in the latest development version. Could you please check the build from http://builds.jabref.org/master/. Thanks! Please remember to make a backup of your library before trying-out this version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants