Page-break in other output formats than LaTeX #1934

todd-a-jacobs · 2015-02-10T17:56:31Z

Pagebreaks Don't Work for Most Output Formats

I have a Markdown file that is supposed to have pagebreaks between certain sections. However, Pandoc 1.10.1 isn't honoring the \newpage or \pagebreak commands when rendering RTF, DOCX, or ODT formatted files. The commands I'm using to invoke pandoc are:

for format in rtf docx odt; do
    pandoc \
        --smart \
        --normalize \
        --standalone \
        --self-contained \
        -f markdown \
        -t $format \
        --output="${FILE/markdown/$format}" \
        "$FILE"
    echo "Created ${FILE/markdown/$format}"
done

PDF Seems to Work

However, the PDF format (which requires a slightly different invocation because it doesn't respect the -t flag) seems to respect the pagebreak requests. For example:

pandoc \
    --standalone \
    --normalize \
    --smart \
    --self-contained \
    --from=markdown \
    --output="${FILE/markdown/pdf}" \
    "$FILE"
echo "Created ${FILE/markdown/pdf}"

The text was updated successfully, but these errors were encountered:

jgm · 2015-02-10T18:36:03Z

Correct, pandoc's internal document model does not currently contain anything corresponding to a page break, so there is no way to convert these. In principle a PageBreak element could be added. It's also possible to work around this deficiency using pandoc filters.

todd-a-jacobs · 2015-02-10T20:20:39Z

A PageBreak element would be great, but I'd be happy to use a filter in the meantime. However, I'm not sure what's entailed in doing so. How would I generate a DOCX with forced page breaks using a filtering mechanism?

jkr · 2015-02-19T17:46:02Z

@CodeGnome : see this thread for some hints on setting up a filter for pagebreaks in docx output:

https://groups.google.com/forum/#!searchin/pandoc-discuss/pagebreak/pandoc-discuss/FzLrhk0vVbU/GtSHaI0jddAJ

s7726 · 2015-03-20T17:07:16Z

@CodeGnome If your page breaks happen to be prior to a given heading level, you can just set the page break before property for that heading style.

Hi-Angel · 2015-07-29T17:22:56Z

I am also voting for the feature to be added — many formats have something according to a page break _(even in CSS are things like page-break-_)*.

Hi-Angel · 2015-08-01T15:28:38Z

Hi, I'm just looking through the code in the hope to add the pagebreak, and some features, and I found, well… Does @jgm notice the two years old pull request?

jgm · 2015-08-01T21:10:01Z

+++ Hi-Angel [Aug 01 15 08:28 ]:

Hi, I'm just looking through the code in the hope to add the pagebreak,
and some features, and I found, well… Does [1]@jgm notice [2]the two
years old pull request?

Adding a NewPage element to the definition and builder is trivial.
But then you need to support it in every reader and writer;
that's a lot more work.

oadam · 2016-10-14T14:04:07Z

If a pull request adding support for NewPage was submitted (including support in every reader and writer), would it be accepted ?
I really need this feature and I'm ready to spend time on this.

jgm · 2016-10-14T14:16:39Z

Yes, I'd accept it if it's of good quality.

Note, it requires a breaking change in pandoc-types.
I'd like to make a new release soon of pandoc-types
(which already has breaking changes) and pandoc.
If you plan to do this soon I could wait a bit.

How do you propose to treat output formats with
nothing corresponding to a page break?

Would it make sense, perhaps, to render it as a

Div ("",["pagebreak"],[]) []

which could at least be intercepted in filters?
This could even be a native pandoc way of creating it.

oadam · 2016-10-14T14:37:33Z

I'll follow whatever recommendation you give :-)

If your code snippet means empty div with a pagebreak css class then yes that might be a good idea (it could be parsed as well by the html reader).

Maybe the writer could even add a inline style attribute with page-break-after: always ?

No need to wait for this before pushing your breaking change. To be honest, I won't look into it before at least a few weeks but it's definitely something that is on my business' road-map.

s7726 · 2016-10-14T18:44:15Z

Putting a class on an empty div won't work (or at least be portable).

http://www.w3schools.com/cssref/pr_print_pageba.asp

Note: You cannot use this property on an empty
or on absolutely positioned elements.

I recently found the page-break-avoid property. I applied it to

's that contained figures that needed to stay with that particular step in a procedure.

tarleb · 2016-10-14T22:00:56Z

MDN states on page-break-before (emphasis mine):

It won't apply on an empty <div> that won't generate a box.

I guess with a little bit of CSS hackery, the div could still be made to generate a box.

jgm · 2016-10-16T20:38:06Z

OK, that's good to know. So implementing a page break in
the HTML writer might be nontrivial...but it's also not
really essential -- I think it would be okay if we just
supported formats that typically produce paginated output
(latex, docx, etc.).

+++ Gavin S [Oct 14 16 11:44 ]:

Putting a class on an empty div won't work (or at least be portable).

[1]http://www.w3schools.com/cssref/pr_print_pageba.asp
Note: You cannot use this property on an empty <div> or on
absolutely positioned elements.
I recently found the page-break-avoid property. I applied it to
's
that contained figures that needed to stay with that particular step in
a procedure.

—
You are receiving this because you were mentioned.
Reply to this email directly, [2]view it on GitHub, or [3]mute the
thread.

References

http://www.w3schools.com/cssref/pr_print_pageba.asp

Page-break in other output formats than LaTeX #1934 (comment)

https://github.com/notifications/unsubscribe-auth/AAAL5HMatca2im4qobxWGKAd7nIHl7rZks5qz81_gaJpZM4Ded9Q

Jmuccigr · 2016-10-17T01:49:01Z

Would definitely like to see this.

And really would like to see printed html handle this too, but that's probably out of scope for pandoc.

mb21 · 2017-01-22T12:22:37Z

Some observations on how different formats handle page breaks:

From the perspective of HTML/CSS, page breaking is about layout, not structure, and is thus implemented in CSS (with the page-break-before and page-break-after properties, as supported by wkhtmltopdf – note that they might be superseded by break-before and break-after but browser support is not forthcoming). As has been noted, these can only be applied to block level elements and the intended usage is to apply them to headers or section divs.

In some restructured-text processors, a pagebreak can apparently also be achieved by a block level directive.

On the other hand, in more imperative document models (ODT, docx, etc), pagebreak usually seems to be an inline element. The pandoc AST already has inline LineBreak and SoftBreak elements and one possible implementation would be to replace them with an inline Break element that has an attribute type=line, type=soft, type=page,type=column etc. Note that implementing a native pandoc pagebreak element as inline is more general than a block element, since the block element can always be simulated by wrapping an inline in an otherwise empty paragraph.

Finally, from the perspective of markdown, I would probably use something like this:

------- {.pagebreak}

fabtho · 2017-02-06T09:06:33Z

I would like to see this to implemented. I just tried to write some filter for pandoc, to use pagebreack for md to ODT, but no success. (I used the source on Google Groups, as mentioned above)

link2xt · 2017-06-27T21:34:26Z

Muse format also has pagebreaks: http://amusewiki.org/library/manual#toc7

mb21 · 2017-09-08T13:01:17Z

btw, iA Writer pagebreak syntax is:

+++

which produces:

<div style="page-break-before: always;"></div>

which webkit-based browsers seem to understand.

autotel · 2018-10-01T13:43:49Z

another nice workaround:

insert a horizontal line -----------------
format the "horizontal line" style to break a page and be invisible, using the text editor (libre office in my case)

grenade · 2019-07-09T11:17:20Z

@CodeGnome : see this thread for some hints on setting up a filter for pagebreaks in docx output:

https://groups.google.com/forum/#!searchin/pandoc-discuss/pagebreak/pandoc-discuss/FzLrhk0vVbU/GtSHaI0jddAJ

thanks for this! i went down this rabbit hole today. it was my first foray into haskell and i'm pleased to say that i am now standing next to a completely bald yak¹. here's what happened:

the problem:

i have a github gist containing markdown files. i have a react app that transforms these markdown files into an html web page. i wanted a way to transform the same markdown files into a hosted google doc that has built in docx and pdf output formats.

the solution:

write some bash that combines all of the gist's markdown files into a single markdown file and use pandoc to transform the markdown into docx format that can be uploaded as a google doc.

the implementation:

use jq and the github gist api to produce a file containing the combined markdown
- the trick here is to insert a separator (\n\n\\newpage\n\n) between the individual markdown files that pandoc can interpret as a block paragraph containing only a page-break.
run pandoc against the combined markdown file to convert it into docx format
- here the trick is to correctly interpret the page-break separator tokens and use a filter to replace them with the correct docx xml separator syntax (<w:p><w:r><w:br w:type=\"page\"/></w:r></w:p>).
- create a haskell code file (docx-page-filter.hs) containing the filter (thank you Joel Allen and John MacFarlane):

import Text.Pandoc.JSON

pagebreakXml :: String
pagebreakXml = "<w:p><w:r><w:br w:type=\"page\"/></w:r></w:p>"

pagebreakBlock :: Block
pagebreakBlock = RawBlock (Format "openxml") pagebreakXml

blockSwapper :: Block -> Block
blockSwapper (Para [Str "\\newpage"])  = pagebreakBlock
blockSwapper blk = blk

main = toJSONFilter blockSwapper

the code above requires compiling but ghc --make -v docx-page-filter.hs throws an error about not being able to import Text.Pandoc.JSON. i don't know what version of ghc was already installed on my fedora-30 system or where it came from.
- download and install the distro build tools, the package manager and the pandoc dependencies:
```
sudo dnf install ghc
sudo dnf install cabal-install
cabal update
cabal install pandoc
```
- go have a coffee now. maybe even go for a run or mow the lawn. you have some time...

if everything compiles, you can run a command like this to perform the conversion:

pandoc combined.md --from gfm --filter docx-page-filter --to docx --output converted.docx

tarleb · 2019-07-23T05:49:53Z

The Lua filters repository has a pagebreak filter which converts raw \newpage commands into page breaks for most formats.

ghost · 2019-09-12T21:05:30Z

I wanted to note that Epub3 supports page breaks as well, although for possibly different use cases.

A page list and page break indicators allow users in mixed print-digital environments to coordinate their positions.

This is nice for preserving information about page numbers (e.g. for citations, printing, or accessibility such as audio queues) without interfering with the document layout.

It supports both in-line and block page breaks.

An empty span element identifies a page break inside a block element. It is identified as a page break using the role attribute with the value doc-pagebreak. The aria-label attribute provides an announceable value.

<p>
   …
   <span role="doc-pagebreak" id="pg24" aria-label="24"/>
   …
</p>

A div element identifies a page break where inline elements are not allowed. This example shows an example of a page number that is intended to be visible in the content.

    <div role="doc-pagebreak" id="pg24">24</div>

Some notes:

would need to keep a counter to mark the page numbers
intended to be placed at page beginnings, rather than endings
cannot be placed inside lists

My personal preference is for formfeed chars to be interpreted as page breaks, at least in markdown. I use the pdftotext CLI to produce formfeed-delimited text files that can be turned into markdown for pandoc, and it would be great if those could be preserved.

jeffmcneill · 2019-11-25T09:29:09Z

This might be somewhat related. Pagebreaks seem to be automatically supported in markdown->pdf in terms of H1s being recognized as new section headers, using:

  \usepackage{titlesec} 
  \newcommand{\sectionbreak}{\clearpage}

Also, when markdown->epub the same section headers H1 are recognized and page breaks are implemented. All fine and dandy.

I'm wondering if it is possible somehow to have H2s recognized as section breaks as well. The main reason is because I need to have both H1 and H2 act as section breaks (page breaks).

Ok, I've worked through these issues, and here is how I've dealt with them, so far: I've added \pagebreak before each new H2, that takes care of the latex/pdf side. For epub, I added the style:

h2 {display: block;
    page-break-before: always; /* CSS 2 */
    break-before: page;   /* CSS 3+ */ }

That seems to take care of the epub side.

If anyone has additional suggestions/options especially for the latex/pdf side, that would be great, but otherwise I've got it working.

jgm · 2019-11-25T15:23:45Z

Try the same thing with \subsectionbreak?

jeffmcneill · 2019-11-26T13:01:41Z

@jgm Excellent! It also supresses a page break if an H2 follows directly an H1, which is what I want. I can't seem to do that with Epub/CSS but that is less of an issue to have an extra page in an ebook, whereas one has to pay for each page in print.

  \usepackage{titlesec} 
  \newcommand{\sectionbreak}{\clearpage} 
  \newcommand{\subsectionbreak}{\clearpage}

Here is documentation of the various section commands that can be used with package titlesec. http://tug.ctan.org/tex-archive/macros/latex/contrib/titlesec/titlesec.pdf

SandeepNaidu · 2020-04-13T04:46:58Z

This still does not work for pandoc export to docx!

gmile · 2020-07-01T19:44:26Z

Had to introduce page breaks to html files that are being converted to .docx, ended up with this script in Lua:

function Para (el)
  if #el.content == 1 and el.content[1].text == "Pagebreak" then
    return pandoc.RawBlock('openxml', '<w:p><w:r><w:br w:type="page"/></w:r></w:p>')
  end
end

return {
  {Para = Para}
}

Given the following input:

<html>
  <body>
    <p>Page 1</p>
    <p>Pagebreak</p>
    <p>Page 2</p>
    <p>Pagebreak</p>
    <p>Page 3</p>
  </body>
</html>

It can be used like this:

pandoc input.html \
  --standalone \
  --lua-filter pagebreak.lua \
  --reference-doc my_styles.docx \
  --output output.docx

tarikgraba · 2021-03-26T07:37:28Z

Hi there,

Can the support of <?asciidoc-pagebreak?> added to the XML DocBook reader?
This tag is generated by asciidoctor/asciidoc when inerting a page break.

It would be great to be able to convert DocBook to Latex without loosing this info.

dwojtas · 2022-02-10T18:47:44Z

Hi,
I see no response to the <?asciidoc-pagebreak?> support request for the docbook reader, I would also benefit from this.
I am processing documents

from asciidoc to docbook using asciidoctor
from docbook to docx using pandoc with custom docx template.

The effects are beautifull, but I must always post-process it by hand with Ctrl+Return to page-break on new chapters.

jgm · 2022-02-10T20:46:57Z

Can the support of <?asciidoc-pagebreak?> added to the XML DocBook reader?

There's no native AST element corresponding to a page break.

leogama · 2022-03-17T16:30:47Z

The R package rmarkdown has a good page break filter: https://github.com/rstudio/rmarkdown/blob/main/inst/rmarkdown/lua/pagebreak.lua

mpickering added the enhancement label Feb 10, 2015

jgm mentioned this issue Jul 26, 2015

LaTeX \clearpage and \newpage are ignored #2330

Closed

Hi-Angel mentioned this issue Aug 1, 2015

Setting margins with «geometry» in LaTeX doesn't work #2340

Closed

crsh mentioned this issue Mar 31, 2016

References should be on new page crsh/papaja#60

Closed

mb21 changed the title ~~Pandoc doesn't honor \newpage or \pagebreak except for PDF output files.~~ Page-break in other output formats than LaTeX Jan 22, 2017

mb21 mentioned this issue Jan 22, 2017

Add pagebreaks to Pandoc #3230

Open

mb21 added the status:more-discussion-needed label Jan 22, 2017

mb21 mentioned this issue Jan 22, 2017

Adding support for NewPage jgm/pandoc-types#9

Closed

kvaleev mentioned this issue Sep 11, 2017

Native pagebreak support foliant-docs/foliant#26

Open

mb21 added the AST change label May 6, 2018

sten0 mentioned this issue Aug 1, 2019

export to typeset script to different formatting conventions (eg: for screen, for stage)? rnkn/fountain-mode#96

Closed

tarleb mentioned this issue May 11, 2020

Page break \newpage does not work in docx output #6358

Closed

aahnik mentioned this issue Oct 12, 2020

[help wanted] How to create a page break? wikiti/pandoc-book-template#15

Closed

tarikgraba mentioned this issue Mar 25, 2021

pandoc does not honor the AsciiDoc page breaks or include::[] macros #988

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page-break in other output formats than LaTeX #1934

Page-break in other output formats than LaTeX #1934

todd-a-jacobs commented Feb 10, 2015

jgm commented Feb 10, 2015 via email

todd-a-jacobs commented Feb 10, 2015

jkr commented Feb 19, 2015

s7726 commented Mar 20, 2015

Hi-Angel commented Jul 29, 2015

Hi-Angel commented Aug 1, 2015

jgm commented Aug 1, 2015

oadam commented Oct 14, 2016 •

edited

Loading

jgm commented Oct 14, 2016

oadam commented Oct 14, 2016

s7726 commented Oct 14, 2016

tarleb commented Oct 14, 2016 •

edited

Loading

jgm commented Oct 16, 2016

Jmuccigr commented Oct 17, 2016

mb21 commented Jan 22, 2017 •

edited

Loading

fabtho commented Feb 6, 2017

link2xt commented Jun 27, 2017

mb21 commented Sep 8, 2017

autotel commented Oct 1, 2018

grenade commented Jul 9, 2019 •

edited

Loading

tarleb commented Jul 23, 2019

ghost commented Sep 12, 2019 •

edited by ghost

Loading

jeffmcneill commented Nov 25, 2019

jgm commented Nov 25, 2019

jeffmcneill commented Nov 26, 2019

SandeepNaidu commented Apr 13, 2020

gmile commented Jul 1, 2020 •

edited

Loading

tarikgraba commented Mar 26, 2021

dwojtas commented Feb 10, 2022

jgm commented Feb 10, 2022

leogama commented Mar 17, 2022

Page-break in other output formats than LaTeX #1934

Page-break in other output formats than LaTeX #1934

Comments

todd-a-jacobs commented Feb 10, 2015

Pagebreaks Don't Work for Most Output Formats

PDF Seems to Work

jgm commented Feb 10, 2015 via email

todd-a-jacobs commented Feb 10, 2015

jkr commented Feb 19, 2015

s7726 commented Mar 20, 2015

Hi-Angel commented Jul 29, 2015

Hi-Angel commented Aug 1, 2015

jgm commented Aug 1, 2015

oadam commented Oct 14, 2016 • edited Loading

jgm commented Oct 14, 2016

oadam commented Oct 14, 2016

s7726 commented Oct 14, 2016

tarleb commented Oct 14, 2016 • edited Loading

jgm commented Oct 16, 2016

Jmuccigr commented Oct 17, 2016

mb21 commented Jan 22, 2017 • edited Loading

fabtho commented Feb 6, 2017

link2xt commented Jun 27, 2017

mb21 commented Sep 8, 2017

autotel commented Oct 1, 2018

grenade commented Jul 9, 2019 • edited Loading

tarleb commented Jul 23, 2019

ghost commented Sep 12, 2019 • edited by ghost Loading

jeffmcneill commented Nov 25, 2019

jgm commented Nov 25, 2019

jeffmcneill commented Nov 26, 2019

SandeepNaidu commented Apr 13, 2020

gmile commented Jul 1, 2020 • edited Loading

tarikgraba commented Mar 26, 2021

dwojtas commented Feb 10, 2022

jgm commented Feb 10, 2022

leogama commented Mar 17, 2022

oadam commented Oct 14, 2016 •

edited

Loading

tarleb commented Oct 14, 2016 •

edited

Loading

mb21 commented Jan 22, 2017 •

edited

Loading

grenade commented Jul 9, 2019 •

edited

Loading

ghost commented Sep 12, 2019 •

edited by ghost

Loading

gmile commented Jul 1, 2020 •

edited

Loading