Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEI format and ".tei.xml" filename suffix #7630

Closed
boisgera opened this issue Oct 19, 2021 · 3 comments
Closed

TEI format and ".tei.xml" filename suffix #7630

boisgera opened this issue Oct 19, 2021 · 3 comments
Labels

Comments

@boisgera
Copy link

Explain the problem.

As far as I can tell, the intent in FormatHeuristics.hs is to associate the TEI format to filenames ending with ".tei.xml". However takeExtension which is used in the source code won't pick such multiple extensions:

$ ghci
GHCi, version 8.6.5: http://www.haskell.org/ghc/  :? for help
Prelude> import System.FilePath (takeExtension)
Prelude System.FilePath> takeExtension "doc.tei.xml"
".xml"

Thus, pandoc won't infer the TEI format for such files:

(pandoc) $ echo "Hello world!" | pandoc -o hello.tei
(pandoc) $ cat hello.tei
<p>Hello world!</p>

(pandoc) $ echo "Hello world!" | pandoc -o hello.tei.xml
[WARNING] Could not deduce format from file extension .xml
  Defaulting to html

Pandoc version?

Tested with pandoc 2.14.2, installed with conda (Ubuntu 20.04.3 LTS)

@boisgera boisgera added the bug label Oct 19, 2021
@jgm
Copy link
Owner

jgm commented Oct 19, 2021

Is .tei.xml the standard extension for TEI files?

@boisgera
Copy link
Author

Disclaimer: I knew absolutely nothing about TEI files 6 hours ago 🙂.

In the specification I see that the media type should be application/tei+xml but I can't find a reference to a specified extension for filenames (I may have missed it, the spec is large). In the available examples (https://github.com/TEIC/TEI/tree/dev/P5/Exemplars) I see the ".tei" extension is used so this one is very probably valid; but I don't see the ".tei.xml" extension used.

@jgm
Copy link
Owner

jgm commented Oct 19, 2021

For the record, this was added in commit 25a9ca6 (2015).
Since as you point out, it never could have worked, I don't think we lose anything by way of backwards compatibility if we take it out. If I learned that .tei.xml was the standard extension, that might motivate me to change the way we check for extensions, but for now I'm just going to remove that line.

@jgm jgm closed this as completed in 7754b7f Oct 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants