A tool to test pandoc document transformations.
Perevir aims to make tests easy to write and use, and to have tests that can also serve as a form of documentation. Ideally, test files should be readable and informative when viewed as rendered Markdown on a developer platform such as GitHub, GitLab, or Codeberg.
The name perevir is the transliteration of the Ukrainian word перевір, "check (something)!"
A mirror of this repository is available on GitHub, but the version there may lag behind at times. Main development happens here on Codeberg.
Extensions for pandoc, for example pandoc Lua filters, should be tested just like any other software. Perevir can read text files with input and the expected output, and check whether the conversion succeeded.
These extensions are often hosted on development platforms that allow Markdown files to be viewed either as code or as rendered documents, and offer syntax highlighting etc. Test definitions (perevirky) should be readable on those platforms, and thus can double as easily accessible and always up-to-date documentation.
Perevir depends on pandoc 3.2.1 or later. It also requires a
working diff
executable in the path to report errors.
The tool is a single file and thus easy to install. Just download
perevir.lua
and call it with pandoc lua perevir.lua <TESTFILE>
.
When used as a command-line program you'll need to have pandoc
installed and have a "pandoc-lua
" symlink to pandoc in your
path.
An alternative installation method is via luarocks, the package manager for Lua.
luarocks install --local perevir
In addition, it may also be necessary to run eval "$(luarocks path)"
to set the environment variables to the correct values.
Perevir can be used both as a command line program and as a library to create customized checkers. The command line program takes as argument the testfile, or a directory of test files.
./perevir.lua <TEST-FILE-OR-DIR>
The format for test files, called "perevirky", is described below.
Since it's often cumbersome to write (or update) expected results
by hand, this can be automated. Call perevir.lua
with -a
to
accept all transformation results as the expected output. The
file will be modified in-place.
All perevirky have two parts: input and expected output.
Each of these parts is marked by setting an appropriate element
ID: input
for the input and expected
or output
for the
expected result.
The example below is a very simple test that would verify the built-in Markdown reader, checking whether it produces the correct "pandoc native" output.
``` markdown {#input}
This is *nice*!
```
The internal document representation for this Markdown is
``` haskell {#expected}
[ Para
[ Str "This"
, Space
, Str "is"
, Space
, Emph [ Str "nice" ]
, Str "!"
]
]
```
Notice the IDs on the code blocks, and that there can be any kind of explanatory text outside of the input and output blocks.
How a code block is parsed into a pandoc document depends on the classes and attributes. In general, the (markup) language identifier is used as the name of a pandoc reader. Hence
```html {#input}
<h1>Intro</h1>
```
marks that the block content must be parsed as HTML.
The extensions
attribute can be set to fine-tune the reader as
one would on the command line. E.g., to disable the smart
extension when parsing the input, one might write
``` markdown {#input extensions="-smart"}
"Yeah, right."
```
Perevir reads the input and output blocks into pandoc's internal document format. The tests checking the conversion results use the objects of that internal format, not the string representation (unless requested). This improves accuracy and also makes tests more robust.
Perevirky files can be viewed as rendered Markdown files on platforms like GitHub, GitLab, Codeberg, etc. Rendered (Markdown) text is generally easier and more pleasant to read than codeblocks with markup. It is therefore possible to use divs to set the input or expected output.
<div id="input">
Normal [pandoc](https://pandoc.org) Markdown
paragraph.
</div>
It is advisable to use HTML div tags instead of pandoc's own fenced
divs syntax, as fenced divs are not supported on most development
platforms and perevirky become less readable when viewed there.
Perevir disables fenced divs when rewriting perevirky with -a
.
Perevir can be configured by setting values below the perevir
metadata field. The following options are
supported:
-
disable
: do not run perevir checks on this file. -
filters
: it takes a list of filters that are run on the input. -
ignore-softbreaks
: treat softbreaks as spaces, meaning that non-semantic linebreaks are ignored when comparing documents. -
metastrings-to-inlines
: convert MetaStrings elements to MetaInline elements. The Markdown parser never produces MetaString values, so this is particularly useful when the expected output contains a YAML metadata block. -
compare
: set this tostrings
to compare string outputs instead of documents. Helpful for tests that use a lossy output format to set the expected result.This option is implied when using command tests.
Example:
perevir:
filters: ['citeproc', 'transmogrify.lua']
ignore-softbreaks: true
This will run the transmogrify.lua
Lua filter on the input and
will make perevir check the result against the given output.
Command tests allow to set a specific pandoc command that
transforms the input into the output. The command must be the
content of a code block with ID command
.
``` sh {#command}
pandoc --from=org --to=html --number-sections
```
The classes on the input and output blocks have no effect in this case.
Command tests differ from other tests in that they compare the expected and actual output as strings. Other tests compare the respective pandoc documents as objects.
This kind of test is an escape hatch, meant to be used when otherwise it's not possible to capture the transformation correctly.