block_pour_docx
but only for part of file to be poured
#97
Replies: 4 comments 2 replies
-
Hello I don't know. There is no function like this in officer nor officedown. Did you try with |
Beta Was this translation helpful? Give feedback.
-
Ok, thanks - I see. However, please find my question at the end. slicedocx.R library(officedown)
library(officer)
library(tidyverse)
# gather necessary info ---------------------------------------------------
doc <- read_docx("otherdoc.docx")
doc_summ <- docx_summary(doc)
index_max <- max(doc_summ$doc_index)
doc_slices <- doc_summ %>%
filter(style_name == "heading 1") %>%
transmute(text = text,
index_start = doc_index,
index_end = lead(doc_index)-1)
# get a part --------------------------------------------------------------
want <- "Part 2"
docpart <- doc
i <- doc_slices %>%
filter(text == want) %>%
pivot_longer(-text) %>%
pull(value)
# delete everything after part
if (!is.na(i[2])) {
for (j in (i[2] + 1):index_max) {
docpart <- docpart %>% cursor_end() %>% body_remove()
}
}
# delete everything before part
if (i[1]>1) {
for (j in 1:(i[1]-1)) {
docpart <- docpart %>% cursor_begin() %>% body_remove()
}
} For comparison: docx_summary(doc) # before
# doc_index content_type style_name text level num_id row_id is_header cell_id col_span row_span
# 1 1 paragraph heading 1 Part 1 NA NA NA NA NA NA NA
# 2 2 paragraph <NA> This is some text in part 1. NA NA NA NA NA NA NA
# 3 3 paragraph List Paragraph Part 1 a 1 1 NA NA NA NA NA
# 4 4 paragraph List Paragraph Part 1 b 1 1 NA NA NA NA NA
# 5 5 paragraph <NA> And more text in part 1. NA NA NA NA NA NA NA
# 6 6 paragraph heading 1 Part 2 NA NA NA NA NA NA NA
# 7 7 paragraph <NA> This is some text in part 2. NA NA NA NA NA NA NA
# 1.1 8 table cell Table Grid Tabhead A NA NA 1 FALSE 1 1 1
# 1.4 8 table cell Table Grid 1 NA NA 2 FALSE 1 1 1
# 2.2 8 table cell Table Grid Tabhead B NA NA 1 FALSE 2 1 1
# 2.5 8 table cell Table Grid 2 NA NA 2 FALSE 2 1 1
# 3.3 8 table cell Table Grid Tabhead C NA NA 1 FALSE 3 1 1
# 3.6 8 table cell Table Grid 3 NA NA 2 FALSE 3 1 1
# 11 9 paragraph <NA> NA NA NA NA NA NA NA
# 12 10 paragraph heading 1 Part 3 NA NA NA NA NA NA NA
# 13 11 paragraph <NA> This is some text in part 3. NA NA NA NA NA NA NA
doc_slices
# text index_start index_end
# 1 Part 1 1 5
# 6 Part 2 6 9
# 12 Part 3 10 NA
docx_summary(docpart) # after
# doc_index content_type style_name text level num_id row_id is_header cell_id col_span row_span
# 1 1 paragraph heading 1 Part 2 NA NA NA NA NA NA NA
# 2 2 paragraph <NA> This is some text in part 2. NA NA NA NA NA NA NA
# 1.1 3 table cell Table Grid Tabhead A NA NA 1 FALSE 1 1 1
# 1.4 3 table cell Table Grid 1 NA NA 2 FALSE 1 1 1
# 2.2 3 table cell Table Grid Tabhead B NA NA 1 FALSE 2 1 1
# 2.5 3 table cell Table Grid 2 NA NA 2 FALSE 2 1 1
# 3.3 3 table cell Table Grid Tabhead C NA NA 1 FALSE 3 1 1
# 3.6 3 table cell Table Grid 3 NA NA 2 FALSE 3 1 1
# 11 4 paragraph <NA> NA NA NA NA NA NA NA @davidgohel How would I now pour the mymarkdown.Rmd ---
output: officedown::rdocx_document
---
# Introduction
Some text and r chunks here
```{r}
source("slicedocx.R")
print(docpart)
```
More text and r chunks here |
Beta Was this translation helpful? Give feedback.
-
Hey @davidgohel, I took another shot at this and would be thankful for a comment. Basically, I created a function
I can then use Functionlibrary(officedown)
library(officer)
library(tidyverse)
get_docx_part <- function(infile, heading1title, outfile){
# import and summary ------------------------------------------------------
doc <- read_docx(infile)
doc_summ <- docx_summary(doc)
index_max <- max(doc_summ$doc_index)
# info on all parts (i.e. sections separated by "heading 1" headers)
doc_parts_info <- doc_summ %>%
filter(style_name == "heading 1") %>%
transmute(text = text,
index_start = doc_index,
index_end = lead(doc_index)-1)
# delete unwanted parts ---------------------------------------------------
# prepare
docpart <- doc
i <- doc_parts_info %>%
filter(text == heading1title) %>%
pivot_longer(-text) %>%
pull(value)
# delete everything after part
if (!is.na(i[2])) {
for (j in (i[2] + 1):index_max) {
docpart <- docpart %>% cursor_end() %>% body_remove()
}
}
# delete everything before part
if (i[1]>1) {
for (j in 1:(i[1]-1)) {
docpart <- docpart %>% cursor_begin() %>% body_remove()
}
}
# print -------------------------------------------------------------------
# create temporary docx file
print(docpart, target = outfile)
} Example Rmd---
output: officedown::rdocx_document
---
# Introduction
Some text and r chunks here
```{r, echo=FALSE, message=FALSE}
source("get_docx_part.R") # get function
get_docx_part(infile = "BaseDocument.docx",
heading1title = "Second header title",
outfile = "temp.docx")
block_pour_docx("temp.docx")
```
# Conclusion
More text and r chunks here Check resultsQuestionsAs you can see, it so far works as intended. However,
|
Beta Was this translation helpful? Give feedback.
-
Hey @davidgohel, I have been implementing this method successfully now, but realized it does not extend well for repeating the step multiple times in the same knitting process. This is because everytime a I guess this goes along with you telling me to "wait the final document is created and then you can delete the file you want to pour". Any suggestions how I could tackle this issue? |
Beta Was this translation helpful? Give feedback.
-
I am writing a docx-document using RMarkdown and would like to include parts of another file
otherdoc.docx
. I guess it is like usingblock_pour_docx
, but only for parts of the document.Let's say
otherdoc.docx
looks like this:In my RMarkdown document I would like to work like so:
What are my options to achieve my goal here?
P.S.: Thanks for the fantastic officeverse
Beta Was this translation helpful? Give feedback.
All reactions