Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong post processing pipeline for pdf_document() conversion #2282

Open
cderv opened this issue Jan 13, 2022 · 5 comments
Open

Wrong post processing pipeline for pdf_document() conversion #2282

cderv opened this issue Jan 13, 2022 · 5 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@cderv
Copy link
Collaborator

cderv commented Jan 13, 2022

Found while working #2264,

The post processor will be run after on the output file being a pdf. Until recently, pdf_document() had no post processor and it seemed to work. But there is one today and I don't think it is working correctly. Oddly, it throws a warning only in debug mode.

A recent fix introduced a post processor in the pdf_document() output. 5a3e941

This creates some issues I believe but it does not show in CI oddly.

output_format$post_processor() is supposed to run on the output of the Pandoc conversion. While looking into the other issue #2264 (comment), it happens that the conversion step (before post processing) will generate a PDF file using latexmk(). when output file is pdf for example.

rmarkdown/R/render.R

Lines 974 to 981 in 0af6b35

if (!grepl('[.]tex$', output_file)) {
latexmk(texfile, output_format$pandoc$latex_engine, '--biblatex' %in% output_format$pandoc$args)
file.rename(file_with_ext(texfile, "pdf"), output_file)
# clean up the tex file if necessary
if (!output_format$pandoc$keep_tex) {
texfile <- normalize_path(texfile)
on.exit(unlink(texfile), add = TRUE)
}

In that case the post_processor() will try to post process the PDF file

rmarkdown/R/pdf_document.R

Lines 191 to 200 in 5a3e941

post_processor <- function(metadata, input_file, output_file, clean, verbose) {
# TODO: remove this temporary fix after the syntax highlighting problem is
# fixed in Pandoc https://github.com/rstudio/bookdown/issues/1157
x <- read_utf8(output_file)
s <- '\\SpecialCharTok{|}\\ErrorTok{\\textgreater{}}'
if (length(grep(s, x, fixed = TRUE)) == 0) return(output_file)
x <- gsub(s, '\\SpecialCharTok{|\\textgreater{}}', x, fixed = TRUE)
write_utf8(x, output_file)
output_file
}

This does not work obviously. Reprex:

dir.create(tmp_dir <- tempfile())
owd <- setwd(tmp_dir)
xfun::download_file("https://raw.githubusercontent.com/rstudio/rmarkdown/main/tests/testthat/test-formats.Rmd")
rmarkdown::render("test-formats.Rmd", rmarkdown::pdf_document(), quiet = TRUE)
#> Warning message:
#> In readLines(con, warn = FALSE) :  invalid input found on input connection 'test-formats.pdf'

The post processing step should happen before latexmk() after Pandoc conversion from .md to .tex. I don't think it is working right now.

Or maybe the latexmk() rendering should happen in the pdf_document() post processing 🤔 But that would mean for user creating custom output that they need to use pdf_document() as base format or do the conversion themself. Maybe room for a pdf_document_base() format that does essential processing ?

I get all those warning message when running tests locally, I am not sure why we don't see them in CI. 🤔

@cderv cderv added bug an unexpected problem or unintended behavior next to consider for next release labels Jan 13, 2022
@atusy
Copy link
Collaborator

atusy commented Jan 16, 2022

the latexmk() rendering should happen in the pdf_document() post processing

I think this makes the logic very clear.
Also, this may fix #2267 as well by using output formats without ` the post processing.

I had an idea to apply latexmk() after post_processor() in render(), however, I threw it away because it breaks output formats that intentionally apply post_processor() to PDF files.

@cderv
Copy link
Collaborator Author

cderv commented Jan 17, 2022

Thanks for the feedback !

however, I threw it away because it breaks output formats that intentionally apply post_processor() to PDF files.

Which format are you thinking off ? I am curious about post processing of PDF files (instead of tex file before PDF conversion)

@atusy
Copy link
Collaborator

atusy commented Jan 18, 2022

Not on a public repository, but I used to apply post processing on PDF to outline fonts to ensure it prints safely by a command like

gs -o output.pdf -dNoOutputFonts -sDEVICE=pdfwrite input.pdf

@cderv
Copy link
Collaborator Author

cderv commented Jan 18, 2022

Thanks for sharing. Good to know this could be useful.

Probably for PDF conversion we need to consider current pre_processor / post_processor as hooks for Pandoc conversion (.md to .tex) and we would need to add post processing hook for the PDF conversion by tinytex::latexmk() 🤔

Seems like a piece is missing here.

cderv added a commit that referenced this issue Mar 1, 2022
reverting 5a3e941 as it introduced a warning as trying to post process a pdf file

reported as part of #2282
cderv added a commit that referenced this issue Mar 1, 2022
reverting 5a3e941 as it introduced a warning as trying to post process a pdf file

reported as part of #2282
@cderv
Copy link
Collaborator Author

cderv commented Mar 1, 2022

#2320 fixed the issue with the warnings due to wrong post processing. I'll leave this issue open to deal in a later version to fix the way post processing is happening for PDF.

@cderv cderv added bug an unexpected problem or unintended behavior and removed next to consider for next release bug an unexpected problem or unintended behavior labels Mar 1, 2022
@cderv cderv mentioned this issue Mar 2, 2022
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
Status: Backlog
Development

No branches or pull requests

2 participants