Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when adding file containing # to entry with character #7012

Closed
1 task done
judober opened this issue Oct 13, 2020 · 20 comments · Fixed by #8382
Closed
1 task done

Error when adding file containing # to entry with character #7012

judober opened this issue Oct 13, 2020 · 20 comments · Fixed by #8382
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement

Comments

@judober
Copy link

judober commented Oct 13, 2020

JabRef version 5.2--2020-10-13--fdaf25a on Windows 10

Steps to reproduce the behavior:

  1. Add bibtexentry for https://link.springer.com/book/10.1007%2F978-3-662-53281-2
  2. Add file
  3. Apply format pattern: "Filename format pattern": [bibtexkey] - [title:latex_to_unicode:remove_braces_formatter:regex("/|:|,","#")]
  4. Restart Jabref
  5. When opening from jabref, the file is not found.
  6. bilatex source file line looks weird: "file = {:Hue17 - Leiter} # Halbleiter #{ Supraleiter.pdf:PDF},"

My setup: biblatex, keypattern: "[auth3][shortyear]"

@Siedlerchr
Copy link
Member

@k3KAW8Pnf7mkmdSMPHz27 I just merged your other PR, does this maybe fix this is as well?

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

@Siedlerchr I don't think so. I'll take a look at it. I don't know how the source file line should look...

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

I can't reproduce. I'd guess that a workaround is to use :regex("[{}]","") rather than remove_braces_formatter (the brackets in regex were only an issue for citation keys, not filenames).

@judober could you paste the bibtexentry or link to it? I am not familiar with springer's webpage and the one I got from google scholar does not match yours.
Also, I'd appreciate if you check if the workaround works for you, it might make it easier for someone to debug this particular issue.

I'd guess this is one/two bugs, I don't believe curly brackets are removed for filenames, in which case it might create an issue when they are stored in the bib-file?
There might also be an issue when the title special field markers attempts to remove the latex commands.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

@Siedlerchr I don't know what is going on in this issue, but I don't think my PR should impact this.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

@judober on another note, if you want to use latex_to_unicode for titles, you might consider using [TITLE:...] (i.e., all-caps). This will not perform the same changes on the title that [title:...] does. Beware that I'd consider this an undocumented feature and it might be subject to change 😬

@Siedlerchr
Copy link
Member

Okay. I think the regex or so does not make sense because the # is interpreted as bibtex string.
@judober It would be really nice if you could show how it should look like, e.g what the expected outcome shoud look like

@judober
Copy link
Author

judober commented Oct 13, 2020

I tested again and I think I have to close Jabref and open it again in order to reproduce (will update original post).

Just added the entry through doi, loaded the pdf (probably any pdf will do) and draged in on the entry. When I click the pdf symbol, after restart, my documentsfolder is opened but neither the pdf folder nor the pdf. Also I have the weird file-entry in the biblatex source that causes problem when using the library (undefined label Halbleiter).

The entry:

@Book{Hue17,
  author    = {Rudolf Huebener},
  date      = {2017},
  title     = {Leiter, Halbleiter, Supraleiter},
  doi       = {10.1007/978-3-662-53281-2},
  publisher = {Springer Berlin Heidelberg},
  file      = {:Hue17 - Leiter} #  Halbleiter #{ Supraleiter.pdf:PDF},
  timestamp = {2020.10.13},
}

I would expect the file-line to be
file = {:Hue17 - Leiter # Halbleiter # Supraleiter.pdf:PDF},
And of course jabref to open the pdf.

My manual workaround:
remove all # from the filename and add it manually. Than I get
file = {:Hue17 - Leiter Halbleiter Supraleiter.pdf:PDF},
and everything works.

I tried [bibtexkey] - [title:latex_to_unicode:regex("[{}]",""):regex("/|:|,","#")] but this didn't change anything.
The same for [bibtexkey] - [TITLE:latex_to_unicode:remove_braces_formatter:regex("/|:|,","#")]

@Siedlerchr @k3KAW8Pnf7mkmdSMPHz27 does this help?

Also, I think this used to work with older jabref versions (5.0). At least I use this pattern for a while and did not notice this issue before.

@Siedlerchr Siedlerchr changed the title Error when adding file to entry Error when adding file to entry with # character Oct 13, 2020
@Siedlerchr Siedlerchr changed the title Error when adding file to entry with # character Error when adding file containing # to entry with character Oct 13, 2020
@koppor
Copy link
Member

koppor commented Oct 13, 2020

# is a special character in BibTeX and in JabRef.

See https://docs.jabref.org/advanced/strings for general information

I can't reproduce step 1 here. What is the BibTeX? Does it contain #

I assume that step 3 introduced #. Would it be possible to use another character. For instance, _?

@judober
Copy link
Author

judober commented Oct 13, 2020

Regarding step 1: I just added the entry using the doi: 10.1007/978-3-662-53281-2
It does not have inherent #

I tried [bibtexkey] - [title:latex_to_unicode:remove_braces_formatter:regex("/|:|,","_")] and it works. Seems like a good solution for me, I was not aware of the specialty of #.
However, I wonder why I haven't had this problem before since I was using # for quite some time.

Also, when # in the filename is problematic, shouldn't jabref check this somehow and warn the user?

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

I can reproduce it from the "new entry" button in the GUI.
Skärmavbild 2020-10-13 kl  11 54 33

On a side note, I also get
11:54:05.303 [JavaFX Application Thread] WARN org.jabref.logic.citationkeypattern.BracketedPattern - Key generator warning: unknown modifier 'remove_braces_formatter'.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

It doesn't seem like there should be an issue if there is only one # (based on BibDatabase#resolveContent). The split ({:Ham81 - Organische Leiter, Halbleiter }{und Photoleiter.pdf:PDF},) happens as soon as you have two (e.g., you can add it to the title Organische leiter, halbleiter ##und photoleiter). Perhaps that's why it hasn't been an issue before?

I'd guess adding it to FileNameCleaner#ILLEGAL_CHARS, same as curly brackets, would be the easy solution?

@Siedlerchr
Copy link
Member

@k3KAW8Pnf7mkmdSMPHz27 Good idea to add it to the illegal chars to avoid problems

@Siedlerchr
Copy link
Member

I don't know how feasible this is, probably needs more thinking. Normally in LaTex you can escape those "illegal" chars using a backslash. And we have already a LatexCleanUpFormatter who does this for certain things. I remember having % signs in the abstract field needed to be escaped.
I tried to escape the # using backslashes as well in the regex, but JabRef still mangles it.

Refs also #7010

Our LatexCleanupFormattter at least fixes the % sign,
https://github.com/JabRef/jabref/blob/1b35f8cb0040fdfb515974e78532598f07e11af2/src/main/java/org/jabref/logic/formatter/bibtexfields/LatexCleanupFormatter.java

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

If this is a rarely used feature, perhaps it should be opt-in rather than opt-out? Or perhaps with some defaults for title, author instead of all STANDARD_FIELDS? (see preferences -> file -> Resolve strings for standard BibTeX fields only for the current settings)

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

I don't know how feasible this is, probably needs more thinking.

You are right. The more I look into it, the more sense it makes to solve it rather than avoiding it.

@koppor
Copy link
Member

koppor commented Oct 14, 2020

On a side note, I also get
11:54:05.303 [JavaFX Application Thread] WARN org.jabref.logic.citationkeypattern.BracketedPattern - Key generator warning: unknown modifier 'remove_braces_formatter'.

Can you try to remove _formatter in the settings? Meaning, it should read remove_braces.

I also added an issue JabRef/user-documentation#318 hopefully guessing right what's going on there.

@Siedlerchr Siedlerchr added the good first issue An issue intended for project-newcomers. Varies in difficulty. label Mar 29, 2021
@1190201923
Copy link

Hi,we are a group of 7 students from Harbin Institue of Technology.Our open source software testing class asked us to contribute to an open-source project.We'd love to help with this issue.Is anyone currently working on this issue?If not can we take it? We are all new contributors to open-source projects.If possible could we get some guidance on which part of the code we should look into first. Thanks a lot
KeXin

@koppor
Copy link
Member

koppor commented Aug 2, 2021

This issue needs much thinking. Please read on at #7010 to get to know the issue around JabRef's internal handling of #.

Deep links to methods are:

  • org.jabref.logic.importer.fileformat.BibtexParser#parseFieldContent
  • org.jabref.logic.bibtex.FieldWriter#write

@1190201923
Copy link

Roger, our team will go to see it right away.

Siedlerchr added a commit that referenced this issue Jan 3, 2022
Siedlerchr added a commit that referenced this issue Jan 13, 2022
* Change default behavior of resolve bibtex strings

Fixes #7010
Fixes #7012
Fixes #8303

* Renaming of fields

* fix prefs, remove migration

* Fix gui properties and l10n

* adjust defaults, fix bst tests

* remove obsolete test
Fix changelog

* fix checkstyle

* fix another test

* fix comment

* Fix typos

* add institution

* Sort fields alphabetically

* Group prefernces at "File": BibTeX strings, Loading, Saving

* Remove unused imports

* Add ADR-0024

* Add test for comment field

* fix tests

* Fix markdown in ADR-0024

* ADR-0024: Fix filename and addr to adr.md

* ADR-0019: Fix typo

* Remove obsolete empty line

* Introduce BIBTEX_STRING_START_END_SYMBOL and remove negation

- Add org.jabref.logic.bibtex.FieldWriter#BIBTEX_STRING_START_END_SYMBOL
- !doNotResolveStrings -> resolveStrings

* Remove deprecated constructor FieldWriterPreferences()

* Fix test on Windows (CRLF issue)

* Add missing context information

* Add more tests

* Fix negation

Co-authored-by: Oliver Kopp <[email protected]>
@Siedlerchr
Copy link
Member

We now changed the whole logic and enable BibTeX string resolving only for a couple of fields. This can be adjusted in the options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants