Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MS Word, some word spaces get lost (line break issue?) #598

Closed
yankustefan opened this issue Mar 25, 2015 · 14 comments
Closed

MS Word, some word spaces get lost (line break issue?) #598

yankustefan opened this issue Mar 25, 2015 · 14 comments
Labels

Comments

@yankustefan
Copy link

Related to "Pasting from word #447"

When I paste some text from MS Word, some spaces (between words) get lost.

In search of the problematic character, I tried to get hold of the data in the clipboard, but I can't figure it out.

However, if I paste with ta-unsafe-sanitizer, and look at the HTML, there are line breaks, where there should be spaces.

These line breaks do not correspond with line breaks in MS Word.
(It can be the second word in a new paragraph)
See Images below.

I wonder if it would be best to replace these line brakes with spaces, and get rid of double spaces later.

Image of HTML output (with ta-unsafe-sanitizer="true"):
ta-spaces

Image of EDITOR output (with ta-unsafe-sanitizer="true"):
ta-spaces_editor

Image of HTML output (with ta-unsafe-sanitizer="false"):
ta-spaces_editor_2

Visible result is the same.

@SimeonC SimeonC added the bug label Apr 2, 2015
@zigrivers
Copy link

I'm running into this same problem. The workaround that I'm having our users do until this is fixed is to paste the text from MS Word into another editor (e.g. Sublime) and then paste into textAngular. They lose all their formatting and have to reformat using textAngular, which is a pain, but I couldn't figure out a way around this bug short of using a different control and I don't want to do that.

@hcharania
Copy link

We are also facing this issue with our users. Just pasted a paragraph from Word into the online Demo page, and confirmed that the problem exists there too.

@snugbeard
Copy link

I noticed that MS Word (here: 2013) "generate" Unicode line feed chars. Those are stripped by textAngular somehow during sanatizing I suppose.

Strangly those line feed are not in the original document and are added while copying formatted text from Word. So, this is somehow a MS Word issue.

A workaround would be to replace with
but that would change the actual text layout from MS Word. But at least the User is able to see those; missing spaces are hard to track.

A clean solution is more difficult. textAngular would need to recognize MS Word generated content e.g. replacing with a space.

Or maybe you can think of a better solution...

Update:
I looked a bit in the code and found the position, where the LF is removed.
processedSafe = safe.replace(/(&#(9|10);)*/ig, '');

I changed it locally to
processedSafe = safe.replace(/(&#(9);)*/ig, '');

While pasting formatted text from MS Word spaces are kept as expected.
But I don't know which side effects this change might create.

@yankustefan
Copy link
Author

@snugbeard, thanks for investigating.

@janvandenberg
Copy link

I have the same issue with copy+paste from MS Word. @snugbeard 's solution works for me too, but I would love to see a permanent solution in an upcoming version of textAngular.

@jeserkin
Copy link

+1 regarding the possible vulnerability. Would like to hear about it from TA developers.

@superduck35
Copy link

+1 for @snugbeard 's workaround - anyone came across any side effects from this change?

@davetheflashguy
Copy link

+1 as well for @snugbeard 's workaround. The only reason I could see this solution as being a problem is if you intend to display the text in a pre tag. While I only use textAngular for a small part of my application and I also only use a few of the toolbar options, I have done a full regression test using existing entries into the editor and have not found any problems at all after implementing this fix.

Thanks for the tip!

@JoelParke
Copy link
Collaborator

Could you retest this with release 1.4.6 which I will put out in a few minutes... Thanks, Joel

@jeserkin
Copy link

@JoelParke will it be updated on official webpage as well?

@JoelParke
Copy link
Collaborator

Everything is up to 1.4.6 at this point.  Or if it is not, please let me know where it's broken.  npm and bower are also updated. 
Please retest and let me know....

Thanks, 
Joel

@stevenkissack
Copy link

I was seeing this issue a while ago too, Word was using the LF character instead of a space sometimes. I am unable to test @snugbeard's fix as I no longer have a document with such formatting but this was with 1.4.6 so if someone in this thread has such a file, can they run it through to see if the bug is resolved for them?

My fix was to add code after line 1563:

text = targetDom.html();
// LF characters instead of spaces in some spots and they are replaced by "/n", so we need to just swap them to spaces 
text = text.replace(/\n/g, ' ')

@JoelParke
Copy link
Collaborator

This fix seems to work very well.  I will be testing and incorporate into into the next release!

Thanks very much for pointing this out.   Sorry that I took so long to get to. 

@JoelParke
Copy link
Collaborator

JoelParke commented Oct 2, 2016

This will be fixed in release 1.5.12

See PR #1354.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests