Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output has hard spaces which are rejected in pandoc #3

Open
colcord opened this issue Mar 27, 2018 · 6 comments
Open

output has hard spaces which are rejected in pandoc #3

colcord opened this issue Mar 27, 2018 · 6 comments

Comments

@colcord
Copy link

colcord commented Mar 27, 2018

Hi, I really appreciate this project. I've just had some trouble when using the output. When I put my text files through pandoc, I get a complaint about hard spaces which have been inserted. I think it's latex which is rejecting them. Let me know if you want more details. Some representations of the spaces are \20.

@euangoddard
Copy link
Owner

I'm not sure what the cause of this issue could be. It may be the text in the clipboard has these characters in which are respected by the markdown conversion. It could also be the library I used to actually convert the HTML to markdown.

Someone previously contributed a patch that added a lot of support for pandoc and I notice there are a lot of replacements done on the stream. Perhaps you'd like to add one that replaces this \20 with a space in a pull request?

@colcord
Copy link
Author

colcord commented Mar 27, 2018

Wow. What a fast response. I'm not a programmer, so I can't write a patch, unfortunately. Let me find another example, and post it here, just so we have a clear test. thanks for your prompt response.

@colcord
Copy link
Author

colcord commented Mar 28, 2018

Hi, I've tested this again. This is the latest web page which caused problems:

https://www.quora.com/What-is-the-best-textbook-for-Category-theory

thanks, Frank

@euangoddard
Copy link
Owner

HI Frank,

I'll see what I can do. This project is pretty much unmaintained so I'll need to find some spare time to look into this. I'll see what I can do

@colcord
Copy link
Author

colcord commented Apr 1, 2018

Hi Euan, just noticed another related bug. When a text as italics, the space before the first asterisk is a hard space. I've just looked at the javascript, and I don't see where it returns a single asterisk in replacement for italics. I see that it would return an underscore. But I haven't seen that in my results. Is most of the conversion using to-markdown?
When I look around, I see that Dom Christie has updated his project to-markdown to turndown
https://github.com/domchristie/turndown
It looks as if he is maintaining it. I don't see a project which uses that code in a manner which is as easy to use as yours.
I wish I could make the changes myself. kind regards, Frank

{
filter: ['em', 'i'],
replacement: function (content) {
return '' + content + ''
}
},

@epsil
Copy link
Contributor

epsil commented Apr 3, 2018

@colcord, try replacing this part:

              .replace(/[ ]+\n/g, '\n')

with:

              .replace(/[ ]+\n/g, '\n')
              .replace(/\u00a0/g, ' ')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants