Wrong post processing pipeline for pdf_document() conversion #2282

cderv · 2022-01-13T14:01:40Z

Found while working #2264,

The post processor will be run after on the output file being a pdf. Until recently, pdf_document() had no post processor and it seemed to work. But there is one today and I don't think it is working correctly. Oddly, it throws a warning only in debug mode.

A recent fix introduced a post processor in the pdf_document() output. 5a3e941

This creates some issues I believe but it does not show in CI oddly.

output_format$post_processor() is supposed to run on the output of the Pandoc conversion. While looking into the other issue #2264 (comment), it happens that the conversion step (before post processing) will generate a PDF file using latexmk(). when output file is pdf for example.

rmarkdown/R/render.R

Lines 974 to 981 in 0af6b35

    
           if (!grepl('[.]tex$', output_file)) { 
        
             latexmk(texfile, output_format$pandoc$latex_engine, '--biblatex' %in% output_format$pandoc$args) 
        
             file.rename(file_with_ext(texfile, "pdf"), output_file) 
        
             # clean up the tex file if necessary 
        
             if (!output_format$pandoc$keep_tex) { 
        
               texfile <- normalize_path(texfile) 
        
               on.exit(unlink(texfile), add = TRUE) 
        
             }

In that case the post_processor() will try to post process the PDF file

rmarkdown/R/pdf_document.R

Lines 191 to 200 in 5a3e941

    
           post_processor <- function(metadata, input_file, output_file, clean, verbose) { 
        
             # TODO: remove this temporary fix after the syntax highlighting problem is 
        
             # fixed in Pandoc https://github.com/rstudio/bookdown/issues/1157 
        
             x <- read_utf8(output_file) 
        
             s <- '\\SpecialCharTok{|}\\ErrorTok{\\textgreater{}}' 
        
             if (length(grep(s, x, fixed = TRUE)) == 0) return(output_file) 
        
             x <- gsub(s, '\\SpecialCharTok{|\\textgreater{}}', x, fixed = TRUE) 
        
             write_utf8(x, output_file) 
        
             output_file 
        
           }

This does not work obviously. Reprex:

dir.create(tmp_dir <- tempfile())
owd <- setwd(tmp_dir)
xfun::download_file("https://raw.githubusercontent.com/rstudio/rmarkdown/main/tests/testthat/test-formats.Rmd")
rmarkdown::render("test-formats.Rmd", rmarkdown::pdf_document(), quiet = TRUE)
#> Warning message:
#> In readLines(con, warn = FALSE) :  invalid input found on input connection 'test-formats.pdf'

The post processing step should happen before latexmk() after Pandoc conversion from .md to .tex. I don't think it is working right now.

Or maybe the latexmk() rendering should happen in the pdf_document() post processing 🤔 But that would mean for user creating custom output that they need to use pdf_document() as base format or do the conversion themself. Maybe room for a pdf_document_base() format that does essential processing ?

I get all those warning message when running tests locally, I am not sure why we don't see them in CI. 🤔

The text was updated successfully, but these errors were encountered:

atusy · 2022-01-16T21:40:12Z

the latexmk() rendering should happen in the pdf_document() post processing

I think this makes the logic very clear.
Also, this may fix #2267 as well by using output formats without ` the post processing.

I had an idea to apply latexmk() after post_processor() in render(), however, I threw it away because it breaks output formats that intentionally apply post_processor() to PDF files.

cderv · 2022-01-17T09:46:51Z

Thanks for the feedback !

however, I threw it away because it breaks output formats that intentionally apply post_processor() to PDF files.

Which format are you thinking off ? I am curious about post processing of PDF files (instead of tex file before PDF conversion)

atusy · 2022-01-18T01:41:03Z

Not on a public repository, but I used to apply post processing on PDF to outline fonts to ensure it prints safely by a command like

gs -o output.pdf -dNoOutputFonts -sDEVICE=pdfwrite input.pdf

cderv · 2022-01-18T09:11:41Z

Thanks for sharing. Good to know this could be useful.

Probably for PDF conversion we need to consider current pre_processor / post_processor as hooks for Pandoc conversion (.md to .tex) and we would need to add post processing hook for the PDF conversion by tinytex::latexmk() 🤔

Seems like a piece is missing here.

reverting 5a3e941 as it introduced a warning as trying to post process a pdf file reported as part of #2282

cderv · 2022-03-01T18:05:44Z

#2320 fixed the issue with the warnings due to wrong post processing. I'll leave this issue open to deal in a later version to fix the way post processing is happening for PDF.

cderv added bug an unexpected problem or unintended behavior next to consider for next release labels Jan 13, 2022

cderv mentioned this issue Jan 19, 2022

Add a custom syntax highlight file to support new R 4.0 pipe #2290

Open

cderv added a commit that referenced this issue Mar 1, 2022

post-processing tex file in pdf document is not yet possible

082e3ef

reverting 5a3e941 as it introduced a warning as trying to post process a pdf file reported as part of #2282

cderv mentioned this issue Mar 1, 2022

post-processing tex file in pdf document is not yet possible #2320

Merged

cderv added a commit that referenced this issue Mar 1, 2022

post-processing tex file in pdf document is not yet possible (#2320)

7eb75a8

reverting 5a3e941 as it introduced a warning as trying to post process a pdf file reported as part of #2282

cderv added bug an unexpected problem or unintended behavior and removed next to consider for next release bug an unexpected problem or unintended behavior labels Mar 1, 2022

cderv mentioned this issue Mar 2, 2022

Release rmarkdown 2.12 #2321

Closed

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong post processing pipeline for pdf_document() conversion #2282

Wrong post processing pipeline for pdf_document() conversion #2282

cderv commented Jan 13, 2022 •

edited

Loading

atusy commented Jan 16, 2022

cderv commented Jan 17, 2022

atusy commented Jan 18, 2022

cderv commented Jan 18, 2022

cderv commented Mar 1, 2022

Wrong post processing pipeline for pdf_document() conversion #2282

Wrong post processing pipeline for pdf_document() conversion #2282

Comments

cderv commented Jan 13, 2022 • edited Loading

atusy commented Jan 16, 2022

cderv commented Jan 17, 2022

atusy commented Jan 18, 2022

cderv commented Jan 18, 2022

cderv commented Mar 1, 2022

cderv commented Jan 13, 2022 •

edited

Loading