-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More flexible handling of case sensitivity in all keys #477
Comments
Hey @kmccurley First an formost, thanks a lot for opening the issue, for your detailed research into the matter and for your willingness to contribute a PR. A couple of (unordered) notes / thoughts from my side:
Possible solution:
Advantage of (1) would be that its very obvious whats going to be done. Advantage of (2) is that it's less boilerplate-code and also somewhat better regarding performance (we only have to loop over each block once instead of three times). |
Thanks for a very detailed report and willingness to improve |
The Another very tiny point about I think it's a great idea to keep I don't have a strong opinion between your options 1) or 2). I regularly parse a file with tens of thousands of blocks, so I might opt for the higher-performance case. |
Is your feature request related to a problem? Please describe.
The bibtex file format is ill-defined when it comes to case sensitivity on keys. This is NOT a duplicate of #453, because that only talks about entry types.
There is a great deal of confusion about case sensitivity in bibtex. This applies to:
There is also some different use cases for bibtexparser. Some people want to use it to parse a bibtex file and get the same thing back when they print it out. Others want to use bibtexparser to parse in a way that is close to the behavior of some other tool to parse bibtex files (notably the
bibtex
binary and thebiblatex
package). This is part of the problem, because different processing tools will exhibit different behavior when they encounter keys that agree in lower case.For example, consider the following LaTeX file:
This example can be used to illustrate the difference between
bibtex
andbiblatex
. If you process this with the bibtex binary, it produces two warnings frombibtex
:If you view the PDF, it took the first
Title
field and dropped the secondtitle
field. It also dropped the secondcamelcase
entry, producing an undefined reference. Hence you may consider thebibtex
binary to treat both entry keys and field keys as case-insensitive. From my observation of author behavior, about 90% use bibtex, and maybe 10% use biblatex. Since the bibtex file format was original bundled to thebibtex
binary, I consider this to be the proper interpretation of case-sensitivity but others may disagree.Now consider the case of
biblatex
. Uncomment the line to load biblatex, remove main.aux and main.bbl, and runpdflatex main;biber main;pdflatex main;pdflatex main
. The resulting PDF file contains three references, and the first reference takes the second title"This has a camel case key"
.The decades-long problem here is that the syntax for original bibtex file format was never really defined (and it's still on version 0.99d). There are various tools to parse and handle them, but they have different behavior because they interpret the file format differently. You could argue that both
bibtex
andbiblatex
treat entry keys and field keys as lower case, but they have different behavior when they encounter keys that have the same lower case. Perhaps other tools have their own weird behavior based on their own interpretation of the incomplete bibtex file format.I came across this problem because I was using
bibtexparser
to produce an HTML format for the bibtex entries, and I wanted our system to emulate the behavior of bothbiblatex
andbibtex
.The solutions that I came to:
bibtex
andbiblatex
do the same. That way it's easier to decide how to format the entries. The only reason I can see to preserve this is if thebibtexparser
user is expecting to see the same thing after parsing and writing out again.\cite
is case-sensitive, I decided not to convert entry keys to lower case. It appears thatbibtexparser.parse_string
does not check case of keys, and only declares a duplicate if the keys match in their original case. The second and subsequent entries with the same key are kept asDuplicateBlockKeyBlock
s but are not treated asentries
. This is not the same behavior of thebibtex
tool, which drops entries if the lower case key is the same as something already seen. It is consistent with how biber parses the entries.biblatex
andbibtex
treat them as such). There is a question as to whether to take the first or last field encountered when there are duplicates, and it depends on whether you are trying to mimicbibtex
orbiblatex
(or something else). I see no reason to keep both 'title' and 'Title' field keys, but this depends on the use case. I use a flag in the constructor to choose between "keep all", "keep first", or "keep last".We are using
bibtexparser
in a system to process latex+bibtex that is uploaded by authors. Our system usesbibexport
to extract the entries that are actually cited, and this uses thebibtex
binary in the script. This tool only works if the authors use thebibtex
tool, since it looks in the.aux
file for\bibcite
. In order to get around this for authors who usebiblatex
, our system creates an artificial.aux
file that looks like it was produced by thebibtex
tool, and we process that withbibexport
so that it can extract the entries. Of coursebiber
andbibtex
treat duplicate keys differently, so this will fail if authors depend on thebiber
behavior to save entries with keys that collide in lower case with others.Describe the solution you'd like
The bottom line here is that software tools to handle the bibtex file format are inconsistent on how they treat keys. It seems useful to offer options for
bibtexparser
to emulate the behavior of other tools that process the bibtex file format. This can be customized by the use of middleware, and it might be useful to have additional standard middleware classes to support the different behavior required. It also seems like it's long overdue for a bibtex file format replacement. There are too many nonstandard entry types and field types. It's probably too late to fix the definition of the bibtex file format unless we add something like@version
at the beginning of the file to say what tools the file is intended to be processed with. I don't think that's the job of bibtexparser though unless it is used in a tool to replace the bibtex or biber toolsI would be willing to contribute a PR to offer other middleware to handle these
cases
.The text was updated successfully, but these errors were encountered: