-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idea: include files (and csv tables) #553
Comments
Not sure if this would fit in Pandoc's goals as being a "universal document converter", but you can do this easily with some wrapper around Pandoc. This would of course require some technical skills from you, but you would gain much more then the above suggested features. There are a bunch of programming tools in Pandoc extra, from which I know (and develop) pander. You could easily write a simple I hope you would find this useful. |
anton-k, I like the idea; had something similar in mind, when writing Another idea related to that: It would be great to have more general support for literate programming. Currently I use the Using In that sense
Unfortunately haskell is currently not included like eg
With that in mind writing tutorials with REPLs like |
This could be done easily using the techniques described in the scripting documentation. |
I opened a seperate issure #656 for it. |
Hi. I'm also looking for this feature :)
Same syntax is also supported by Leanpub system. I think some "include feature" is a must if you write large text in markdown. |
@thewatts, Thanks for the clue. It is very easy to follow the "do your self" way, of course: I have a bit of Ruby that does the magic. But I see value in it as a standard feature with a standard syntax a no need for externals tools... |
Found what they use - they have a rakefile that will take and parse the |
There is |
@thewatts I also have a rake file doing the same thing :). Well, mine is recursive also. I copy here so it can help others # yields every line. Assume root_dir & file are Pathname objects
def merge_mdown_includes(root_dir, file, &block)
file.each_line do |line|
if line =~/(.*)<<\[(.*)\]$/
incl_file = root_dir + $2
yield $1 if block
merge_mdown_includes(root_dir, incl_file, &block)
else
yield line if block
end
end
end
# hin about use previous routine:
merge_mdown_includes(root_dir, file) do |line|
output_file.puts line
end |
Instead of adding another preprocessing syntax on top of Pandoc Markdown I use the following syntax to include files:
one could also extend this to:
This way the inclusion syntax can act on the abstract syntax tree (AST) of a Pandoc document - one can get the same result from HTML like this (HTML -> Markdown -> Markdown with inclusions -> Target format):
Here is a small hack in form of a Perl script that I use by now. while(<>) {
if (/^`([^`]+)`\{\.include\}\s*$/) {
if (-e $1 && open my $fh, '<', $1) {
local $/;
print <$fh>;
close $fh;
} else {
print STDERR "failed to include file $1\n";
}
} else {
print $_;
}
} The final implementation should work on the AST as well to allow inclusion inside other elements, for instance:
|
@nichtich Nice idea; converted to python and combined with Makefile: # Makefile fragment
%.pdf : %.md
cat $^ | ./include.py | pandoc -o $@ #!/usr/bin/env python
import re
import sys
include = re.compile("`([^`]+)`\{.include}")
for line in sys.stdin:
if include.search(line):
input_file = include.search(line).groups()[0]
file_contents = open(input_file, "rb").read()
line = include.sub(line, file_contents)
sys.stdout.write(line) |
See also this discussion on the mailing list. |
And here's my take on a Haskell filter that includes CSV's as tables: pandoc-placetable |
File extension dependent overloading of the image inclusion is a great idea! |
I've written a basic Pandoc filter in Haskell that could include referenced Markdown files recursively, meaning the nested includes are also included. (Although only 2 levels deep, for now.) Take a look: https://github.com/steindani/pandoc-include To include one or multiple files use the following syntax: ```include
chapter1.md
chapter2.md
#dontinclude.md
``` |
Hi, @mpickering, may I ask what's the status on this? Are there any branch that has work-in-progress (to see if anything to help)? I think there are a few different categories of file extensions that can be included:
|
Is this feature still under development? This would allow a complete replacement most static site generators.. |
I don't think anybody is working on this. My personal opinion is that this is out of scope, as the increase in complexity seems not worth it. A solution for CSV exists with pandoc-placetable. If one does not want to install additional binaries, pandoc 2 makes it easy achieve most of what was suggested here via lua filters. E.g., the below filter would replace an figure with its Markdown content if an image has class "markdown". This is fully portable and doesn't require extra software other than pandoc. function Para (elem)
if #elem.content == 1 and elem.content[1].t == "Image" then
local img = elem.content[1]
if img.classes[1] == "markdown" then
local f = io.open(img.src, 'r')
local blocks = pandoc.read(f:read('*a')).blocks
f:close()
return blocks
end
end
end |
Do you mean include files or table? Apparently 2 different (related) issues are mentioned here. I think the reason why it's been taking so long is mainly not because of the difficulty/feasibility to include files, but about the question of if this should be included in pandoc, and how it should behaves (e.g. recursive?). e.g. @jgm has an pandoc-include example in the tutorial in writing pandoc filters, and has been distributed in pandoc-include: Include other Markdown files. And there's also panflute filter doing so. So does it needed to be done in pandoc?
Having a better template system is more important than having native pandoc-include in this aspect. I remember there's an issue about this. try searching it and see if you have any comments/suggestions there. |
pandoc-include is built against pandoc 1.19 , so the newer syntax is not parsed correctly.. Currently my workaround is to use paru-insert.rb but it's really rather slow, pushing my build times up by 10s just to include 3 partials.. |
We already have filters that can do just this. But note the limitations: if we parse the included content separately, then footnotes and link references defined elsewhere in the document can't affect it. What we really need is just a way to include a file in the input stream. |
I don't know. I think that from a user's perspective, it's helpful if different syntax is used for different things, rather than overloading several very different functions on one syntactic element (image syntax), or having two very slightly different syntactic elements ( |
I agree the include syntax should be alien and block level. (Including something very short that should be online seems won’t be a common pattern.) Also agree that it shouldn’t be parsed, include should just include things. (More like LaTeX input command.) So it is more a pre-processor than AST parser (like other existing include solutions.) Anything more complicated is doable from filter. I guess we should discuss about the primary use case for this to be fruitful. And I think the primary use case is for people to write longer form of document that they want to “modularize” their document to make it more tractable. If this is true then we can assume people already is typing for example the headers in the correct level and don’t need to be able to specify how you want to change that. If you keep it very general, there’s going to be a performance penalty (say parse doc and walk “filter” and repeat) in addition to possibly more complicated syntax (link-like syntax vs just {{ file }}.) [Also we may need to think about how to “escape” the include. Say you settle for a certain block level syntax (Ie in its own line) that defines include. What happen in the same document I want to have a code-block, perhaps illustrating the include syntax, then how can we tell pandoc not to expand that here? (Code block won’t work as it is not parsed yet.) may be to solution is very simple, that we should just escape the sequence: {{ file }}.] |
@ickc It's a processing instruction, not a preprocessor -- we're parsing the document as we go, so we can still tell the difference between the include directive in a code block and one outside of one. No worries there. |
Oh, that's interesting. But wouldn't people expect it to be included in the code-block as well (i.e. include syntax inside code-block would still be interpreted as include)? I think people used to include being a pre-processor (as all existing implementation does AFAIK) might actually be surprised by it. I think this is superior and may need to promote it a bit in the documentation. (Just a few sentences and may be 2 examples, one for telling people would they should do if they want include in code-block with the other syntax above.) |
For includes in code blocks, you'll need to use the different syntax, probably:
Of course, if you'd rather have a genuine preprocessor (the sort that doesn't pay attention to the Markdown syntax), you can always use m4 or another preprocessor in a pipe before pandoc. |
If we introduce here processing directives, we should have a look at Asciidoc, and how they do it. We should agree to a uniform syntax together with its escaping rules. File transclusion might not be the only thing we might have in the future: Asciidoc has the following preprocessing macros (they call it) relevant to this discussion:
The asciidoc include syntax also acts on input stream level and has no awarness of the document (however having some nice IMO needed features such as The question is: can such parser directives be standardized? Since that would mean any decent parser can read these, act on them or ignore them if not known. How do we separate parser directives, with what character(s)?
or if we want more alianate things why not
IMO |
See also Subtext syntax (a rough subset of markdown) uses
Would include file |
I like the |
Current summary: #553 (comment) |
An other already existing solution to include external files (without adding syntax) is codebraid (actually intended to include auto-generated content to Markdown):
To convert it to any output format, use something like the following:
CSV or Excel tables from external sources can be included as well:
|
This is nothing new, basically a filter that execute code. (There are filters that already does this.) Note the huge security risk you're exposing yourself in though, just know what you're signing up for, and only run it on documents you trust. This is a digression to this thread however, which asks for a native include syntax. |
Thanks for emphasizing the security concern. The reason why I mentioned codebraid in this thread was (a) I hoped to support the discussion whether such an "include" feature is needed at all by contributing to "what alternative solutions already exist" (in addition to the filters that were already mentioned) and (b) the URL to this discussion page is wide-spread in the internet (stack overflow, ...) related to the topic "include external files in pandoc markdown" and people are searching for ways to do so here. Still, I'm sorry if I cluttered the discussion with my posts - feel free to delete them. ;-) |
I think it's fine to mention workarounds like this here, where people who search for this issue will find them. |
@phispi, nothing wrong with that, may be I should put it in another way: Include using a filter approach is solved, there's many solutions doing just that already. And the example you gave while does the job, is
it can be seen as the following 3 different levels:
Edit: see these comments for the limitation of a filter approach and how a native approach can do better: |
@ickc Thanks for the good summary: I am wondering if we could push point 3 towards a agreeable solution: Summary: #553 (comment) Conclusions so far:
Hope I did not miss any good objections mentioned so far… |
Any progress on this? |
Just to mention one more syntax I see in the wild: Obsidian uses
|
From the point of view of someone who writes Markdown but isn't familiar with the Pandoc internals, the image-like syntaxes are by far my favorite (in Markdown itself I don't see any conceptual difference between instructing my renderer to include an image and it including the contents of a text file, but I recognize that most output formats are going to care about the difference). Whatever way it goes, the two considerations I find most important are:
I also do see some benefit in making it an inlineable syntax, at least as far as I understand the AST from reading this thread. Specifically, I'm looking at having some file I wrote normally but want to include it into a blockquote: |
Your fallback point is good point, some markdown extensions has this feature too such as the definition list. The last feature you mentioned seems to be hard to define. Logically it is like nesting an arbitrary data structure inside another, but the AST is not like that. e.g. should we understand heading as nesting, and (perhaps optionally) indent the heading level inside the outer heading? Or e.g. in the case of |
I figured it would get more complicated in the AST than it is conceptually. If it's not easily possible, that's perfectly fine and I'd not miss it too badly. I brought that up mostly just to lay my full wishlist on the table; it's really only the first two points that I'd personally consider critical. Still, I'm not involved in implementing anything about this, so I don't have as much of a say as y'all who are. |
I'm now running up against the need for a syntax for includes. I initially gravitated to overloading the image syntax as well. It makes sense to me that the image syntax is pulling image data and embedding it, and works well (I think) for video and audio. But I don't like that there is no real fallback. I feel like if you take this markdown to another processor which doesn't support transclusion, say to GitHub, then there should be a visible mark left indicating that something was there. With the image syntax, you'll get an invisible link, not even a missing image graphic. We have the same problem with I'm surprised no one has mentioned generic directives, proposed over on the CommonMark forum (many years ago).
It's alien enough that non-supporting processors will leave it alone. It has inline, leaf, and container syntaxes.
I guess one drawback is that it uses an English word for the directive name. But I feel like that's just going to be the case for a syntax that could be extended to support many other things. Otherwise I think I gravitate towards the Obsidian syntax ( I do think that what we choose here would set direction - which is really needed. I think it would be important coming from such a well respected tool as Pandoc. |
Just to close the loop on my comment above, GitLab now supports includes, using the |
@jgm: Is there any potential we could take up this thread again after 12 years, to maybe have some progress maybe really related to #553 (comment) |
Quarto uses the following syntax and supports this syntax even within code blocks: {{< include.qmd >}} I don't think this syntax is best for pandoc, because you may want to reserve Myst when used with Sphinx has the following syntax, but this is actually part of a larger system of directives: ```{include} ../README.md
``` It has the benefit of being extensible, but is maybe not in the spirit of markdown, because it is not the most lightweight. I actually like the earlier suggested idea of using <[introduction](introduction.md)> renders as: or performs an include when rendered by pandoc. |
As far as I understand pandoc can process several files
in one way only. You have to list them in the command line. There is
a solution to simulate include files with scripting. It's indicated
in the pandoc's official guide.
Markdown is a tiny language. We should keep it small. So here is an idea
of how to simulate latex's
input
command without extending Markdown syntax.We can overload include image construction. If a file has an image extension,
than it's treated as an image, but if it's
.txt
, it can be treated as Markdown:I've come to this idea while thinking about long tables.
Imagine that someone is writing a research report. There are long
tables produced by an algorithm. Tables are saved in some
standard format for tables, for example CSV. And then user can write
The text was updated successfully, but these errors were encountered: