Task list (e.g. for GSOC) #1169

cartazio · 2014-02-19T22:12:05Z

tasks that might be suitable for part of a GSOC for some lucky student!

jgm · 2014-03-11T18:00:05Z

Here is a list of projects that could be done in pandoc:

Use Text instead of String throughout. This would involve API changes and extensive (but not too difficult) changes to

pandoc-types
pandoc
texmath
highlighting-kate
pandoc-citeproc

Also worth considering whether it would help to use my custom text-based parser combinators from cheapskate (perhaps with some amplifications) instead of the slower parsec.
2. Add a Haddock writer (#1135)
3. Create a new Haddock reader based on the current haddock code. (The present reader is based on older haddock code that used alex and happy, and it doesn't match current haddock syntax.)
4. Modify Image type to allow explicit encoding of embedded images (data instead of URL). That is, instead of containing a field for a URL, an Image would contain a field that could be either a URL or an encoded image (ByteString and MIME type).
5. Add a Lines (or LineBlock) Block element, and modify readers, writers, and associated code. Currently we just use a Para with LineBreaks. This is non-ideal, especially when converting into formats allowing line blocks.
6. Add an EPUB reader. (The embedded images would be important for this, since EPUBs frequently contain images.) (#652)
7. Create a flexible system for labeling objects (images, tables, code) and referring back by number and/or link. Or, more minimally: handle \label and \ref better when parsing LaTeX.
8. Automatic identifiers for images and tables in HTML output. (#208) Must take care not to break existing documents relying on automatic identifiers for headers.
9. Syntax and (block or inline?) element for anchors.
10. Allow attributes on links, images?

cartazio · 2014-03-11T18:58:52Z

great!

knrafto · 2014-03-13T20:56:36Z

How about a PDF reader/writer pair for pandoc? The writer could be written with HPDF. The reader would strip stuff it can't understand, but try to keep text, headings, images, and the like. I've been (minimally) working on a PDF reading library already, and this looks like a great application.

jgm · 2014-03-13T21:01:30Z

+++ Kyle Raftogianis [Mar 13 14 13:56 ]:

How about a PDF reader/writer pair for pandoc? The writer could be
written with HPDF. The reader would strip stuff it can't understand,
but try to keep text, headings, images, and the like. I've been
(minimally) working on a PDF reading library already, and this looks
like a great application.

Interesting idea. My worry is that functionality would be too limited
for both the reader and the writer. I suppose proper typesetting of
math is out of the question. But if the writer could do everything
else - emphasis, paragraph layout, lists, tables - then I'd be open
to it. As for the reader, how much structure can be gotten from a PDF?
I'm skeptical but open-minded.

knrafto · 2014-03-13T21:09:49Z

I see your point. PDFs are for rendered documents, not for markup. However, PDF documents still store metadata, and can store outlines (which can be turned into section headings) and "article threads", which describe how the text is connected into sections. It definitely needs more thought, though.

cartazio · 2014-03-13T21:10:12Z

type setting math at some level could done, it just might require having access to CM font or something

cartazio · 2014-03-13T21:11:59Z

though that might get out of scope of whats safe for a gsoc project.

KurtPfeifle · 2014-03-13T22:14:40Z

On Thu, Mar 13, 2014 at 10:01 PM, John MacFarlane
[email protected]:

+++ Kyle Raftogianis [Mar 13 14 13:56 ]:

How about a PDF reader/writer pair for pandoc? The writer could be
written with HPDF. The reader would strip stuff it can't understand,
but try to keep text, headings, images, and the like. I've been
(minimally) working on a PDF reading library already, and this looks
like a great application.

Interesting idea. My worry is that functionality would be too limited
for both the reader and the writer. I suppose proper typesetting of
math is out of the question. But if the writer could do everything
else - emphasis, paragraph layout, lists, tables - then I'd be open
to it. As for the reader, how much structure can be gotten from a PDF?

It's not possible to answer that in a generic way. From some PDFs you
cannot get any structure at all.

Good chances to get most of the structure are from "tagged"[*] PDFs and
PDF/UA (this is a new ISO standard meaning "universal accessibility"), but
these are still very rare outside there in the big, big world...

[*] If you do not know about "tagged" PDF just try to imaging a lot of
additional markup being contained unvisibly in the rendered document.

cartazio · 2014-03-13T22:30:23Z

but for the writer there should be a way right?

knrafto · 2014-03-13T22:55:46Z

I think the LaTeX writer offers much more than a PDF writer would. I don't think this idea would be very feasible. Thanks for the comments!

mpickering · 2014-03-16T18:57:13Z

I am in the process of writing a proposal for adding an EPUB reader.

mpickering · 2014-12-08T20:26:52Z

1,5,7,8,9,10 are still open from this list for anyone finding it.

cartazio · 2014-12-08T21:06:42Z

and some would probably make great GSOCs!

jgm · 2015-01-02T17:27:39Z

Closing. Opened new list at #1852.

jgm added the enhancement label Mar 24, 2014

jgm changed the title ~~Document Task wishlist~~ Task list (e.g. for GSOC) Jan 2, 2015

jgm closed this as completed Jan 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task list (e.g. for GSOC) #1169

Task list (e.g. for GSOC) #1169

cartazio commented Feb 19, 2014

jgm commented Mar 11, 2014

cartazio commented Mar 11, 2014

knrafto commented Mar 13, 2014

jgm commented Mar 13, 2014

knrafto commented Mar 13, 2014

cartazio commented Mar 13, 2014

cartazio commented Mar 13, 2014

KurtPfeifle commented Mar 13, 2014

cartazio commented Mar 13, 2014

knrafto commented Mar 13, 2014

mpickering commented Mar 16, 2014

mpickering commented Dec 8, 2014

cartazio commented Dec 8, 2014

jgm commented Jan 2, 2015

Task list (e.g. for GSOC) #1169

Task list (e.g. for GSOC) #1169

Comments

cartazio commented Feb 19, 2014

jgm commented Mar 11, 2014

cartazio commented Mar 11, 2014

knrafto commented Mar 13, 2014

jgm commented Mar 13, 2014

knrafto commented Mar 13, 2014

cartazio commented Mar 13, 2014

cartazio commented Mar 13, 2014

KurtPfeifle commented Mar 13, 2014

cartazio commented Mar 13, 2014

knrafto commented Mar 13, 2014

mpickering commented Mar 16, 2014

mpickering commented Dec 8, 2014

cartazio commented Dec 8, 2014

jgm commented Jan 2, 2015