pandoc 2.15
Click to expand changelog
-
Add
--sandbox
option (#5045).- Add sandbox feature. When this option is used, readers and writers only have access to input files (and other files specified directly on command line). This restriction is enforced in the type system.
- Filters, PDF production, custom writers are unaffected. This feature only insulates the actual readers and writers, not the pipeline around them in Text.Pandoc.App.
- Note that when
--sandboxed
is specified, readers won’t have access to the resource path, nor will anything have access to the user data directory.
-
--self-contained
: Fix bug that caused everything to be made a data URI (#7635, #7367). We only need to use data URIs in certain cases, but due to a bug they were being used always. -
Pandoc will now fall back to latin1 encoding for inputs that can’t be read as UTF-8. This is what it did previously for content fetched from the web and not marked as to content type. It makes sense to do the same for local files. In this case a
NotUTF8Encoded
warning will be issued, indicating that pandoc is interpreting the input as latin1. -
Markdown reader:
- Don’t parse links or bracketed spans as citations (#7632). Previously pandoc would parse
[link to (@a)](url)
as a citation; similarly[(@a)]{#ident}
. This is undesirable. One should be able to use example references in citations, and even if@a
is not defined as an example reference,[@a](url)
should be a link containing an author-in-text citation rather than a normal citation followed by literal(url)
. - Fix interaction of
--strip-comments
and list parsing (#7521). Use of--strip-comments
was causing tight lists to be rendered as loose (as if the comment were a blank line). - Fix parsing bug for math in bracketed spans and links (#7623). This affects math with unbalanced brackets (e.g.
$(0,1]$
) inside links, images, bracketed spans. - Fix code blocks using
--preserve-tabs
(#7573). Previously they did not behave as the equivalent input with spaces would.
- Don’t parse links or bracketed spans as citations (#7632). Previously pandoc would parse
-
DocBook reader:
- Honor linenumbering attribute (Samuel Tardieu). The attribute DocBook
linenumbering="numbered"
on code blocks maps to thenumberLines
class internally.
- Honor linenumbering attribute (Samuel Tardieu). The attribute DocBook
-
LaTeX reader:
- Implement siunitx v3 commands (#7614). We support
\unit
,\qty
,\qtyrange
, and\qtylist
as synonynms of\si
,\SI
,\SIrange
, and\SIlist
. - Properly handle
\^
followed by group closing (#7615). - Recognize that
\vadjust
sometimes takes “pre” (#7531). - Ignore (and gobble parameters of) CSLReferences environment (#7531). Otherwise we get the parameters as numbers in the output.
- Restrict
\endinput
to current file (Simun Schuster).
- Implement siunitx v3 commands (#7614). We support
-
RST reader: handle escaped colons in reference definitions (#7568).
-
HTML reader:
- Handle empty tbody element in table (#7589).
-
Ipynb reader (Kolen Cheung):
- Get cell output mime from
raw_mimetype
in addition toformat
. (format
is what the spec calls for, butraw_mimetype
is often used in practice; see jupyter/nbformat#229). - Add more formats that can be handled as “raw” cells.
- Fix mime type for
rst
. - Support
text/markdown
, which is now a supported mime type for raw output (#7561).
- Get cell output mime from
-
RTF reader:
- Support
\binN
for binary image data. - If doc begins with { … } only parse its contents. Some documents seem to have non-RTF (e.g. XML) material after the
{\rtf1 ... }
group. - Ignore
\pgdsc
group. Otherwise we get style names treated as test. - Better handling of
\*
and bookmarks. We now ensure that groups starting with\*
never cause text to be added to the document. In addition, bookmarks now create a span between the start and end of the bookmark, rather than an empty span.
- Support
-
Docx reader:
- Avoid blockquote when parent style has more indent (Milan Bracke). When a paragraph has an indentation different from the parent (named) style, it used to be considered a blockquote. But this only makes sense when the paragraph has more indentation. So this commit adds a check for the indentation of the parent style.
- Fix handling of empty fields (Milan Bracke). Some fields only have an
instrText
and no content, Pandoc didn’t understand these, causing other fields to be misunderstood because it seemed like a field was still open when it wasn’t. - Implement PAGEREF fields (Milan Bracke). These fields, often used in tables of contents, can be a hyperlink.
- Fix handling of nested fields (Milan Bracke). Fields delimited by
fldChar
elements can contain other fields. Before, the nested fields would be ignored, except for the end, which would be considered the end of the parent field. - Add placeholder for word diagram instead of just omitting it (Ezwal).
-
Org reader:
-
Docx writer:
- Make id used in
native_numbering
predictable (#7551). If the image has the id IMAGEID, then we use the id ref_IMAGEID for the figure number. This allows one to create a filter that adds a figure number with figure name, e.g.<w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t> </w:r></w:fldSimple>
. If an image lack an id, an id of the formref_fig1
is used.
- Make id used in
-
Ensure we have unique ids for
wp:docPr
andpic:cNvPr
elements (#7527, #7503). -
Handle SVG images (#4058). This change has several parts:
- In Text.Pandoc.App, if the writer is docx, we fill the media bag and attempt to convert any SVG images to PNG, adding these to the media bag. The PNG backups have the same filenames as the SVG images, but with an added .png extension. If the conversion cannot be done (e.g. because rsvg-convert is not present), a warning is omitted.
- In Text.Pandoc.Writers.Docx, we now use Word 2016’s syntax for including SVG images. If a PNG fallback is present in the media bag, we include a link to that too.
-
Powerpoint writer (Emily Bourke):
- Add support for more layouts (#5097). Up til now, four layouts were supported: “Title Slide” (used for the automatically generated metadata slide), “Section Header” (used for headings above slide level), “Two Column” (used when there’s a columns div), “Title and Content” (used for all other slides). We now support three additional layouts: “Comparison”, “Content with Caption”, and “Blank”. The manual describes the logic that determines which layout is used for a slide. Layouts may be customized in the reference doc.
- Support specifying slide background images using a
background-image
attribute on the slide’s heading. Only the “stretch” mode is supported, and the background image is centred around the slide in the image’s larger axis, matching the observed default behaviour of PowerPoint. - Add support for incremental lists (through same methods as in other slide writers) (#5689).
- Copy embedded fonts from reference doc.
- Include all themes in output archive.
- Fix list level numbering (#4828, #4663). In PowerPoint, the content of a top-level list is at the same level as the content of a top-level paragraph: the only difference is that a list style has been applied. Previously, the writer incremented the paragrap h level on each list, turning what should be top-level lists into second-level lists.
- Line up list continuation paragraphs. This commit changes the
marL
andindent
values used for plain paragraphs and numbered lists, and changes the spacing defined in the reference doc master for bulleted lists. For paragraphs, there is now a left-indent taken from theotherStyle
in the master. For numbered lists, the number is positioned where the text would be if this were a plain paragraph, and the text is indented to the next level. This means that continuation paragraphs line up nicely with numbered lists. Existing reference docs may need to be modified so thatotherStyle
andbodyStyle
indent levels match, for this feature to work with them. - Consolidate text runs when possible (jgm). This slims down the output files by avoiding unnecessary text run elements.
- Support footers in the reference doc. There is one behaviour which may not be immediately obvious: if the reference doc specifies a fixed date (i.e. not automatically updating), and there’s a date specified in the metadata for the document, the footer date is replaced by the metadata date.
- Fix presentation rel numbering. Before now, the numbering of
rId
s was inconsistent when making the presentation XML and when making the presentation relationships XML. - Don’t add relationships unnecessarily. Before now, for any layouts added to the output from the default reference doc, the relationships were unconditionally added to the output. However, if there was already a layout in slideMaster1 at the same index then that results in duplicate relationships.
- If slide level is 0, don’t insert a slide break between a heading and a following table, “columns” div, or paragraph starting with an image.
- Fix capitalisation of
notesMasterId
. - Restructure tests.
-
Asciidoc writer:
- Translate numberLines attribute to
linesnum
switch (Samuel Tardieu). - Improve escaping for
--
in URLs (#7529).
- Translate numberLines attribute to
-
LaTeX writer:
- Make babel use more idiomatic (#7604, hseg). Use babel’s bidi implementation. Import babel languages individually instead of as package options. Move
header-includes
to afterbabel
setup so it can be modified. - Use babel, not polyglossia, with xelatex. Previously polyglossia worked better with xelatex, but that is no longer the case, so we simplify the code so that babel is used with all latex engines. This involves a change to the default LaTeX template.
- Make babel use more idiomatic (#7604, hseg). Use babel’s bidi implementation. Import babel languages individually instead of as package options. Move
-
Markdown writer:
- Avoid bad wraps at the Doc Text level. Previously we tried to do this at the Inline list level, but it makes more sense to intervene on breaking spaces at the Doc Text level.
- Use
underline
class rather thanul
for underline. This only affects output withbracketed_spans
enabled. The markdown reader parses spans with either.ul
or.underline
as Underline elements, but we’re moving towards preferring the latter.
-
RST writer:
-
Properly handle anchors to ids with spaces or leading underscore (#7593). In this cases we need the quoted form, e.g.
.. _`foo bar`: .. _`_foo`:
Side note: rST will “normalize” these identifiers anyway, ignoring the underscore.
-
-
HTML writer:
- Render
\ref
and\eqref
as inline math, not display (see #7589). - Pass through
\ref
and\eqref
if MathJax is used (#7587). - Pass through inline math environments with KaTeX.
- Support
--reference-location
for HTML output (#7461, Francesco Mazzoli). - Set “hash” to True by default (for reveal.js) (#7574). See #6968 where the motivation for setting “hash” to True is explained.
- Render
-
Native writer: Use pretty-show to format native output (#7580). Previously we used our own homespun formatting. But this produces over-long lines that aren’t ideal for diffs in tests. Performance is slower by about a factor of 10, but this isn’t really a problem because native isn’t suitable as a serialization format. (For serialization you should use json, because the reader is so much faster than native.)
-
Org writer:
- Don’t indent contents of code blocks. We previously indented them by two spaces, following a common convention. Since the convention is fading, and the indentation is inconvenient for copy/paste, we are discontinuing this practice.
- Update list of supported source languages in org writer (#5440).
-
Ipynb writer (Kolen Cheung):
- Improve round trip identity for raw cell output. See jupyter/nbformat#229. The Jupyter ecosystem, including nbconvert, lab and notebook, deviated from their own spec in nbformat, where they used the key
raw_mimetype
instead offormat
. Moreover, the mime-type of rst used in Jupyter deviated from that suggested by https://docutils.sourceforge.io/FAQ.html and is defined astext/restructuredtext
when chosen from “Raw NBConvert Format” in Jupyter. The new behavior should matche the real world usage better, hence improving the round-trip “identity” in raw-cell. - Add more formats that can be handled as “raw” cells.
- Improve round trip identity for raw cell output. See jupyter/nbformat#229. The Jupyter ecosystem, including nbconvert, lab and notebook, deviated from their own spec in nbformat, where they used the key
-
EPUB writer:
- Add EPUB3 subject metadata (authority/term) (nuew). This adds the ability to specify EPUB 3
authority
andterm
specific refinements to thesubject
tag. Specifying a plainsubject
tag in metadata will function as before. - Treat epub:type “frontispiece” as front matter (#7600).
- Add EPUB3 subject metadata (authority/term) (nuew). This adds the ability to specify EPUB 3
-
reveal.js template: Fix line numbers in source code (#7634). We need
overflow: visible
for these to work, and reveal’s default css disables this. So we re-enable it in the default template. -
Text.Pandoc.Writers.Shared:
- Export
splitSentences
as a Doc Text transform [API change]. Use this in man/ms. We used to attempt automatic sentence splitting in man and ms output, since sentence-ending periods need to be followed by two spaces or a newline in these formats. But it’s difficult to do this reliably at the level of[Inline]
.
- Export
-
Text.Pandoc.Translations: small revisions for compatibility with aeson 2.
-
Don’t prepend
file://
to--syntax-definition
on Windows (#6374). This was a fix for a problem in skylighting, but this problem doesn’t exist now that we’ve moved from HXT to xml-conduit. -
Text.Pandoc.Extensions:
- Add
Ext_footnotes
to defaultgfm
etxensions. Nowgfm
supports footnotes. - Alphabetize Extension constructors (also affects
--list-extensions
).
- Add
-
Text.Pandoc.Citeproc.Util: Better implementation of
splitStrWhen
. Previously the citeproc code had two less efficient implementations. -
Update documentation for definition_list extension (#7594). In 2015, we relaxed indentation requirements for the first line of a definition (see commit d3544dc and issue #2087), but the documnentation wasn’t updated to reflect the change.
-
Text.Pandoc.Citeproc.BibTeX: Fix expansion of special strings in series e.g.
newseries
orlibrary
(#7591). Expansion should not happen when these strings are protected in braces, or when they’re capitalized. -
Text.Pandoc.Logging: add
NotUTF8Encoded
constructor toLogMessage
[API change]. -
Text.Pandoc.App.FormatHeuristics: remove
.tei.xml
extension for TEI (#7630). This never worked, becausetakeExtension
only returns.xml
. So it won’t be missed if we remove it. -
Text.Pandoc.Image:
- Generalize
svgToPng
to MonadIO. svgToPng
, change first parameter from WriterOptions to Int.
- Generalize
-
Text.Pandoc.Class:
- Add
readStdinStrict
method to PandocMonad [API change]. - Generalize type of
extractMedia
[API change]. It was uselessly restricted to PandocIO, instead of any instance of PandocMonad and MonadIO.
- Add
-
Text.Pandoc.Class.PandocIO: derive MonadCatch, MonadThrow, MonadMask. This allows us to use
withTempDir
[API change]. -
Add module Text.Pandoc.Class.Sandbox, defining
sandbox
. Exported via Text.Pandoc.Class. [API change] -
Text.Pandoc.Filter: Generalize type of
applyFilters
from PandocIO to any instance of MonadIO and PandocMonad [API change]. -
Text.Pandoc.PDF: generalize type of
makePDF
: instead of PandocIO, it can be used in any instance of PandocMonad, MonadIO, and MonadMask [API change]. -
Lua subsystem and custom writers: generalize types from PandocIO to any instance of PandocMonad and MonadIO [API change]. The type of
runLua
is now(PandocMonad m, MonadIO m) => LuaE PandocError a -> m (Either PandocError a)
The change from
Lua
toLuaE PandocError
is due to the switch to hslua-2.0; see next item. -
Lua modules (Albert Krewinkel):
- Switch to hslua-2.0. The new HsLua version takes a somewhat different approach to marshalling and unmarshalling, relying less on typeclasses and more on specialized types. This allows for better performance and improved error messages. Furthermore, new abstractions allow to document the code and exposed functions.
- Marshal Version values, Inline elements, Attr elements, and Pandoc elements as userdata.
- Remove deprecated inline constructors
DoubleQuoted
,SingleQuoted
,DisplayMath
, andInlineMath
. - Attr values are no longer normalized when assigned to an Inline element property.
- It’s no longer possible to access parts of Inline elements via numerical indexes. E.g.,
pandoc.Span('test')[2]
used to givepandoc.Str 'test'
, but yieldsnil
now. This was undocumented behavior not intended to be used in user scripts. Use named properties instead. - Accessing
.c
to get a JSON-like tuple of all components no longer works. This was undocumented behavior. - Only known properties can be set on an element value. Trying to set a different property will now raise an error.
- Adds a new
pandoc.AttributeList()
constructor, which creates the associative attribute list that is used as the third component ofAttr
values. Values of this type can often be passed to constructors instead ofAttr
values. - Convert IOErrors to PandocErrors in
pandoc.pipe
function (#7523).
-
Text.Pandoc.PDF: Previously we had to run
runIOorExplode
insidewithTempDir
. Now that PandocIO is an instance of MonadMask, this is no longer necessary. -
Text.Pandoc.App:
- Reorganize to make it easier to limit IO in main loop. Previously we used liftIO fairly liberally. The code has been restructured to avoid this.
- Move output-file writing out of PandocMonad action.
-
Text.Pandoc.App.OutputSettings: Generalize some types so we can run this with any instance of PandocMonad and MonadIO, not just PandocIO.
-
Use
simpleFigure
builder in readers andSimpleFigure
pattern synonym in writers (Aner Lucero). -
Allow time 1.12.
-
Use skylighting-0.12, skylighting-core-0.12. This fixes highlighting issues with typescript, scala, and other syntaxes that include keyword lists from different syntaxes.
-
Use citeproc 0.6, commonmark 0.2.2.1, commonmark-extensions 0.2.2, texmath 0.12.3.2, ipynb 0.1.0.2. (These changes also allow building with aeson >= 2.)
-
Require doclayout >= 0.3.1.1. This fixes recognition of “real widths” of emoji characters, which is important for tabular layout.
-
Cut out over 100K of fat in epub test golden files.
-
Make
test/epub/wasteland.epub
valid. -
Add missing
%
on some command tests. This prevented--accept
from working properly. -
Command tests: raise error if command doesn’t begin with
%
. -
OOXML tests: use pretty-printed form to display diffs. Otherwise everything is on one line and the diff is uninformative.
-
Fix compareXML helper in Tests.Writers.OOXML. Given how it is used, we were getting “mine” and “good” flipped in the test results.
-
MANUAL.txt:
- Clarify
attributes
extension support (William Lupton). - Document formats affected by
--reference-location
. - Document error code 25
- Add some more info regarding
--slide-level=0
(Salim B). - Add more to security section of manual.
- Mention support of
title-toc
(#7171, Christophe Dervieux).
- Clarify
-
doc/lua-filters.md:
- Add missing type for Image title (Quinn).
- Improve order of Image fields (Quinn).
- Rephrase pandoc.path docs (#7548, Quinn).
- Do not leak working directory in TikZ filter (Jeroen de Haas).