Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying from Word or other sources #1423

Closed
Sofian777 opened this issue Jun 24, 2017 · 15 comments
Closed

Copying from Word or other sources #1423

Sofian777 opened this issue Jun 24, 2017 · 15 comments
Assignees
Labels
[Feature] Blocks Overall functionality of blocks [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Feature] Paste [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable

Comments

@Sofian777
Copy link

Sofian777 commented Jun 24, 2017

My first experience was a bit shocking. Me and my clients are rarely writing directly in WordPress, but creating text in Word. By using the heading formats, lists, bold, italic etc correctly there, pasting the content into WordPress is easy and it takes everything as it should be. How is this suppossed to work with Gutenberg? It just puts everything as plain text, all formatting gets lost, and needing a "block" everytime I want to create a headline is a completely inefficient workflow. I want all my text in one piece with the abilities to format it, but I cannot see at all how this natural formatting will work in this new idea?

This whole Gutenberg thing will practically ruin WordPress for me. I hope that at least it will always be possible to go for the TinyMCE via plugin for the ones who don't like and want this block things. It's a nice idea for images and widgets, but quotes, headlines etc can not be single blocks. There needs to be a text block that allows text formatting as we are used to. So what is the idea with TinyMCE, should Gutenberg be an alternative people can use, or is it suppossed to replace the existing editor?

Or imagine the often task to copy articles from an old website into a new WP installation. The compatibility with regular HTML formatted text needs to stay intact. For the moment I cannot believe my eyes.

@samikeijonen
Copy link
Contributor

It's suppose the replace existing editor as far as I know. But I do see your concern about the current workflow, perhaps there will be Word block (in a plugin format if nothing else).

May I ask a question out of curiosity without any hidden agenda. Why you and your client are using Word and then pasting the text in current editor? Why writing in Word is better workflow than writing directly in the editor?

@afercia
Copy link
Contributor

afercia commented Jun 24, 2017

A "paste from Word" block sounds like a very interesting idea.

@Sofian777
Copy link
Author

I discovered now the "classic text" block, which is taking all the text as I wish for. Yet the toolbar is not working there, and I hope that the TinyMCE will be implemented here exactly as it works now, including the TinyMCE Advanced plugin to style it as one wishes to, and all possible TinyMCE functionality (paste_preprocess etc.). In this way it would be ensured that the ones who simply want to continue business as usual, are not forced to leave the ship. After my first shock, I can say that Gutenberg might be a good step forward, if TinyMCE stays fully implemented as is. In this way, paste from Word is still onboard through TinyMCE as we are used to.

@Sofian777
Copy link
Author

@samikeijonen: I write directly in WP when I create tutorials about WP and am online anyhow. But any serious article takes long time to develop and I prefer offline software and having things saved on my laptop.

@westonruter
Copy link
Member

As far as I know, pasting from Word should result in paragraphs, headings, images, lists and any other content converted into blocks. If all of the text copied from Word presently loses all formatting, then this is simply a bug.

@Sofian777
Copy link
Author

I can only paste into a block, and there it is processed, in the text block for example pasted as plain text. If I deselect everything and press Ctrl-V, nothing happens.
What am I suppossed to do to achieve the described result?

@jasmussen
Copy link
Contributor

@iseulde is working on something in #1331.

@mtias mtias added [Feature] Blocks Overall functionality of blocks [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Jun 26, 2017
@ellatrix ellatrix self-assigned this Aug 11, 2017
@ellatrix
Copy link
Member

How does paste from Word currently work in Gutenberg? I personally don't have Word, so I can't test. It would be great if someone could share some raw pieces of HTML that we receive. I added some logging, so in master or from v0.9 you'll be able to see the received HTML in the console. Particularly the "Received HTML" and "MCE processed HTML" are important. Normally TinyMCE should already do the heavy lifting for Word.

@ellatrix
Copy link
Member

Also for other sources this raw HTML is valuable of course.

@Sofian777
Copy link
Author

Hello iseulde, I want to share my ideal vision of the pasting process, and I hope I can continue this experience also after Gutenberg is released. I don't know exactly where to post it, I am a rare visitor in GitHub but want to contribute what I achieved. I wrote an article about it so you can easily have a look and also try out the code: (http://sundari-webdesign.com/wordpress-removing-classes-styles-and-tag-attributes-from-pasted-content/)

@mtias
Copy link
Member

mtias commented Aug 31, 2017

@Sofian777 we've done several improvements to the pasting flow in the last couple releases. I am going to close this one as we have specific issues tracking specific improvements, but feel free to add further comments, or open new issues with the specific needs or bugs.

@mtias mtias closed this as completed Aug 31, 2017
@fumikito
Copy link

@mtias I've tried copy and pasting from MS Word on Mac 2008 and some style information is pasted.
Office for Mac 2008 is very old and I don't think Gutenberg should support it, but it remains some information.

2018-09-28 18 57 18

@ellatrix
Copy link
Member

@fumikito Could you share what's logged in the browser console?

@fumikito
Copy link

@iseulde Here you are :)

Received HTML:

 <html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta name=表題 content="">
<meta name=キーワード content="">
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 2008">
<meta name=Originator content="Microsoft Word 2008">
<link rel=File-List
href="file://localhost/private/var/folders/cd/24f0cz1s38117q0n0kgggd8h0000gn/T/TemporaryItems/msoclip/0clip_filelist.xml">
<!--[if gte mso 9]><xml>
 <o:DocumentProperties>
  <o:Template>Normal.dotm</o:Template>
  <o:Revision>0</o:Revision>
  <o:TotalTime>0</o:TotalTime>
  <o:Pages>1</o:Pages>
  <o:Words>52</o:Words>
  <o:Characters>298</o:Characters>
  <o:Company>株式会社破滅派</o:Company>
  <o:Lines>2</o:Lines>
  <o:Paragraphs>1</o:Paragraphs>
  <o:CharactersWithSpaces>365</o:CharactersWithSpaces>
  <o:Version>12.0</o:Version>
 </o:DocumentProperties>
 <o:OfficeDocumentSettings>
  <o:AllowPNG/>
 </o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:Zoom>0</w:Zoom>
  <w:TrackMoves>false</w:TrackMoves>
  <w:TrackFormatting/>
  <w:PunctuationKerning/>
  <w:DrawingGridVerticalSpacing>10 pt</w:DrawingGridVerticalSpacing>
  <w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
  <w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery>
  <w:ValidateAgainstSchemas/>
  <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
  <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
  <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
  <w:Compatibility>
   <w:SpaceForUL/>
   <w:BalanceSingleByteDoubleByteWidth/>
   <w:DoNotLeaveBackslashAlone/>
   <w:ULTrailSpace/>
   <w:DoNotExpandShiftReturn/>
   <w:AdjustLineHeightInTable/>
   <w:BreakWrappedTables/>
   <w:DontGrowAutofit/>
   <w:DontAutofitConstrainedTables/>
   <w:DontVertAlignInTxbx/>
   <w:UseFELayout/>
  </w:Compatibility>
  <w:NoLineBreaksAfter Lang="JA">$([\{£¥‘“〈《「『【〔$([{「£¥</w:NoLineBreaksAfter>
  <w:NoLineBreaksBefore Lang="JA">!%),.:;?]}¢°’”‰′″℃、。々〉》」』】〕゛゜ゝゞ・ヽヾ!%),.:;?]}。」、・゙゚¢</w:NoLineBreaksBefore>
 </w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:LatentStyles DefLockedState="false" LatentStyleCount="276">
 </w:LatentStyles>
</xml><![endif]-->
<style>
<!--
 /* Font Definitions */
@font-face
	{font-family:"MS 明朝";
	panose-1:2 2 6 9 4 2 5 8 3 4;
	mso-font-charset:78;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:1 0 16778247 0 131072 0;}
@font-face
	{font-family:Century;
	panose-1:2 4 6 4 5 5 5 2 3 4;
	mso-font-charset:0;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 0 0 0 1 0;}
@font-face
	{font-family:"\@MS 明朝";
	panose-1:2 2 6 9 4 2 5 8 3 4;
	mso-font-charset:78;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:1 0 16778247 0 131072 0;}
 /* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{mso-style-parent:"";
	margin:0mm;
	margin-bottom:.0001pt;
	text-align:justify;
	text-justify:inter-ideograph;
	mso-pagination:none;
	font-size:10.5pt;
	mso-bidi-font-size:12.0pt;
	font-family:"Times New Roman";
	mso-ascii-font-family:Century;
	mso-fareast-font-family:"MS 明朝";
	mso-hansi-font-family:Century;
	mso-bidi-font-family:"Times New Roman";
	mso-font-kerning:1.0pt;}
 /* Page Definitions */
@page
	{mso-page-border-surround-header:no;
	mso-page-border-surround-footer:no;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:99.25pt 30.0mm 30.0mm 30.0mm;
	mso-header-margin:36.0pt;
	mso-footer-margin:36.0pt;
	mso-paper-source:0;}
div.Section1
	{page:Section1;}
-->
</style>
<!--[if gte mso 10]>
<style>
 /* Style Definitions */
table.MsoNormalTable
	{mso-style-name:標準の表;
	mso-tstyle-rowband-size:0;
	mso-tstyle-colband-size:0;
	mso-style-noshow:yes;
	mso-style-parent:"";
	mso-padding-alt:0mm 5.4pt 0mm 5.4pt;
	mso-para-margin:0mm;
	mso-para-margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	mso-ascii-font-family:Century;
	mso-ascii-theme-font:minor-latin;
	mso-fareast-font-family:"MS 明朝";
	mso-fareast-theme-font:minor-fareast;
	mso-hansi-font-family:Century;
	mso-hansi-theme-font:minor-latin;
	mso-bidi-font-family:"Times New Roman";
	mso-bidi-theme-font:minor-bidi;
	mso-font-kerning:1.0pt;}
</style>
<![endif]-->
</head>

<body bgcolor=white lang=JA style='tab-interval:48.0pt;text-justify-trim:punctuation'>
<!--StartFragment-->

<p class=MsoNormal style='text-indent:10.5pt;mso-char-indent-count:1.0'><span
style='font-family:"MS 明朝";mso-ascii-font-family:Century;mso-hansi-font-family:
Century'>しばらくたって、口の中から腐臭がする。アロロロ! そういや電気は止められていたんだ! かれこれ一ヶ月は腐りっぱなしだったのに、ぼくはなんで食べてしまったんだろう! 流しへヨーグルト豆腐を吐き出す! 不健康な黄色のチーズ片がぼくに「こんにちは」を言う!</span></p>

<p class=MsoNormal><span style='font-family:"MS 明朝";mso-ascii-font-family:Century;
mso-hansi-font-family:Century'> ところで、とぼくは口を拭いながら考える。なぜ彼は豪華な宿舎があるというのに、こんな貧乏くさいアパートを借りていたんだろう。彼は有能な特殊能力者集団「</span><span
lang=EN-US><ruby style='ruby-align:distribute-space'><span lang=JA
style='font-family:"MS 明朝";mso-ascii-font-family:Century;mso-hansi-font-family:
Century'>○者</span><rp>(</rp><rt style='font-size:5.0pt;font-family:"MS 明朝";
layout-grid-mode:line'>まるもの</rt><rp>)</rp></ruby><span lang=JA
style='font-size:10.5pt;mso-bidi-font-size:12.0pt;font-family:"MS 明朝";
mso-ascii-font-family:Century;mso-hansi-font-family:Century'>」の中でもさらに特別な「語り部」だ。宿舎以外に家を持つにしたって、フリーヶ丘あたりに住むことも簡単だった。それとも、なにかどうしようもない事情があって、こんな寂しい下宿を潜伏先にしていたのだろうか?</span></span></p>

<!--EndFragment-->
</body>

</html>

index.js?ver=1537354255:12 Received plain text:

 しばらくたって、口の中から腐臭がする。アロロロ! そういや電気は止められていたんだ! かれこれ一ヶ月は腐りっぱなしだったのに、ぼくはなんで食べてしまったんだろう! 流しへヨーグルト豆腐を吐き出す! 不健康な黄色のチーズ片がぼくに「こんにちは」を言う!
 ところで、とぼくは口を拭いながら考える。なぜ彼は豪華な宿舎があるというのに、こんな貧乏くさいアパートを借りていたんだろう。彼は有能な特殊能力者集団「○者(まるもの)」の中でもさらに特別な「語り部」だ。宿舎以外に家を持つにしたって、フリーヶ丘あたりに住むことも簡単だった。それとも、なにかどうしようもない事情があって、こんな寂しい下宿を潜伏先にしていたのだろうか?
index.js?ver=1537354255:2 Processed HTML piece:

 <p>











&lt;!--
 /* Font Definitions */
@font-face
	{font-family:"MS 明朝";
	panose-1:2 2 6 9 4 2 5 8 3 4;
	mso-font-charset:78;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:1 0 16778247 0 131072 0;}
@font-face
	{font-family:Century;
	panose-1:2 4 6 4 5 5 5 2 3 4;
	mso-font-charset:0;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:3 0 0 0 1 0;}
@font-face
	{font-family:"\@MS 明朝";
	panose-1:2 2 6 9 4 2 5 8 3 4;
	mso-font-charset:78;
	mso-generic-font-family:auto;
	mso-font-pitch:variable;
	mso-font-signature:1 0 16778247 0 131072 0;}
 /* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{mso-style-parent:"";
	margin:0mm;
	margin-bottom:.0001pt;
	text-align:justify;
	text-justify:inter-ideograph;
	mso-pagination:none;
	font-size:10.5pt;
	mso-bidi-font-size:12.0pt;
	font-family:"Times New Roman";
	mso-ascii-font-family:Century;
	mso-fareast-font-family:"MS 明朝";
	mso-hansi-font-family:Century;
	mso-bidi-font-family:"Times New Roman";
	mso-font-kerning:1.0pt;}
 /* Page Definitions */
@page
	{mso-page-border-surround-header:no;
	mso-page-border-surround-footer:no;}
@page Section1
	{size:612.0pt 792.0pt;
	margin:99.25pt 30.0mm 30.0mm 30.0mm;
	mso-header-margin:36.0pt;
	mso-footer-margin:36.0pt;
	mso-paper-source:0;}
div.Section1
	{page:Section1;}
--&gt;







</p><p>しばらくたって、口の中から腐臭がする。アロロロ! そういや電気は止められていたんだ! かれこれ一ヶ月は腐りっぱなしだったのに、ぼくはなんで食べてしまったんだろう! 流しへヨーグルト豆腐を吐き出す! 不健康な黄色のチーズ片がぼくに「こんにちは」を言う!</p><p> ところで、とぼくは口を拭いながら考える。なぜ彼は豪華な宿舎があるというのに、こんな貧乏くさいアパートを借りていたんだろう。彼は有能な特殊能力者集団「○者(まるもの)」の中でもさらに特別な「語り部」だ。宿舎以外に家を持つにしたって、フリーヶ丘あたりに住むことも簡単だった。それとも、なにかどうしようもない事情があって、こんな寂しい下宿を潜伏先にしていたのだろうか?</p>
[Violation] 'paste' handler took 1368ms
[Violation] Forced reflow while executing JavaScript took 85ms

@bmeacham
Copy link

I have a similar issue with Word 2010 on Windows, but all I need to do is delete the first block, the one containing the style definitions. The rest of it comes out OK. I have to manually change some blocks from Text to Quote to get the indentation I want, but that is minor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Blocks Overall functionality of blocks [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Feature] Paste [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable
Projects
None yet
Development

No branches or pull requests

9 participants