You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I can tell, the intent in FormatHeuristics.hs is to associate the TEI format to filenames ending with ".tei.xml". However takeExtension which is used in the source code won't pick such multiple extensions:
$ ghci
GHCi, version 8.6.5: http://www.haskell.org/ghc/ :? for help
Prelude> import System.FilePath (takeExtension)
Prelude System.FilePath> takeExtension "doc.tei.xml"
".xml"
Thus, pandoc won't infer the TEI format for such files:
(pandoc) $ echo "Hello world!" | pandoc -o hello.tei
(pandoc) $ cat hello.tei
<p>Hello world!</p>
(pandoc) $ echo "Hello world!" | pandoc -o hello.tei.xml
[WARNING] Could not deduce format from file extension .xml
Defaulting to html
Pandoc version?
Tested with pandoc 2.14.2, installed with conda (Ubuntu 20.04.3 LTS)
The text was updated successfully, but these errors were encountered:
Disclaimer: I knew absolutely nothing about TEI files 6 hours ago 🙂.
In the specification I see that the media type should be application/tei+xml but I can't find a reference to a specified extension for filenames (I may have missed it, the spec is large). In the available examples (https://github.com/TEIC/TEI/tree/dev/P5/Exemplars) I see the ".tei" extension is used so this one is very probably valid; but I don't see the ".tei.xml" extension used.
For the record, this was added in commit 25a9ca6 (2015).
Since as you point out, it never could have worked, I don't think we lose anything by way of backwards compatibility if we take it out. If I learned that .tei.xml was the standard extension, that might motivate me to change the way we check for extensions, but for now I'm just going to remove that line.
Explain the problem.
As far as I can tell, the intent in FormatHeuristics.hs is to associate the TEI format to filenames ending with
".tei.xml"
. HowevertakeExtension
which is used in the source code won't pick such multiple extensions:Thus,
pandoc
won't infer the TEI format for such files:Pandoc version?
Tested with pandoc 2.14.2, installed with conda (Ubuntu 20.04.3 LTS)
The text was updated successfully, but these errors were encountered: