Document does not have any linebreak for Chinese language #16

iDreamCXK · 2018-12-25T09:20:08Z

MigraDoc does not add any linebreak for Chinese language,as shown in the figure.
However,the English paragraph can be dealt correctly.

After debugging，I have found something wrong with the method "internal override void VisitDocumentObjectCollection(DocumentObjectCollection elements)" in the project MigraDoc.DocumentObjectModel"

internal override void VisitDocumentObjectCollection(DocumentObjectCollection elements)
{
        List<int> textIndices = new List<int>();
        if (elements is ParagraphElements)
        {
               for (int idx = 0; idx < elements.Count; ++idx)
              {
                    if (elements[idx] is Text)
                        textIndices.Add(idx);
              }
        }

     int[] indices = (int[])textIndices.ToArray();
     if (indices != null)
     {
          int insertedObjects = 0;
          foreach (int idx in indices)
          {
              Text text = (Text)elements[idx + insertedObjects];
              string currentString = "";
              foreach (char ch in text.Content)
              {
                   // TODO Add support for other breaking spaces (en space, em space, &c.).
                   switch (ch)
                   {
                       case ' ':
                       case '\r':
                       case '\n':
                       case '\t':
                       if (currentString != "")
                       {
                           elements.InsertObject(idx + insertedObjects, new Text(currentString));
                           ++insertedObjects;
                           currentString = "";
                        }
                       elements.InsertObject(idx + insertedObjects, new Text(" "));
                       ++insertedObjects;
                       break;
		       ...

The method replaces every blank for a DocumentObject ,and then MigraDoc can deal with every English word correctly ! Howevery ,thers is no blank for Chinese paragraph , such as "圣诞节（Christmas）又称耶诞节、耶稣诞辰，译名为“基督弥撒”，是西方传统节日，起源于基督教，在每年公历12月25日。弥撒是教会的一种礼拜仪式。圣诞节是一个宗教节，因为把它当作耶稣的诞辰来庆祝，故名“耶诞节”。"

I add a patch for my source code , if the text contains a Chines word , the method will replace it with a DocumentObject ,as shown below.

         foreach (char ch in text.Content)
         {
		if ( ch >= 0x4e00 && ch <= 0x9fbb )
		{
			if (currentString != "")
			{
				elements.InsertObject(idx + insertedObjects, new Text(currentString));
				++insertedObjects;
				currentString = "";
			}
			elements.InsertObject(idx + insertedObjects, new Text(ch.ToString()));**
			++insertedObjects;**
			continue;
		}

                // TODO Add support for other breaking spaces (en space, em space, &c.).
                switch (ch)
                 {
                      case ' ':
                      case '\r':
                      case '\n':
                      case '\t':
                      .....

And the MigraDoc performs well

However , I think the way to solve the problem is Low-Performance and not thorough. Other languages ,like Japanese,will also meet the same problem.Does the author have considered this problem ? The document do not have any way to resolve the problem.

The text was updated successfully, but these errors were encountered:

TH-Soft · 2018-12-25T10:12:01Z

Does the author have considered this problem?

The authors know very little about CJK and RTL, so they are not surprised about issues with non-Latin non-LTR languages. They even wrote that in the FAQ.

iDreamCXK · 2018-12-27T14:09:24Z

Thank you for your reminding , I hope the issues showed above will be helpful for authors , and resovle it further .

# This is the 1st commit message: Introduce xml format for serialization and deserialzion # This is the commit message empira#2: Update README.md # This is the commit message empira#3: Comment fix for release build # This is the commit message empira#4: xml parser fix # This is the commit message empira#5: last page header and footer # This is the commit message empira#6: footer fix # This is the commit message empira#7: upgrade README.md # This is the commit message empira#8: Update README.md # This is the commit message empira#9: last page header footer rendering fix # This is the commit message empira#10: ListInfo fix serialization # This is the commit message empira#11: paragraph parsing fix # This is the commit message empira#12: text parser fix # This is the commit message empira#13: support para break # This is the commit message empira#14: CDATA parser fix # This is the commit message empira#15: rounded corner radius for table cells # This is the commit message empira#16: makes DifferentLastPageHeaderFooter section aware # This is the commit message empira#17: fix cell serialization # This is the commit message empira#18: Avoid rendering failure images # This is the commit message empira#19: barcode rendering # This is the commit message empira#20: doc fix # This is the commit message empira#21: support barcode elements inside a paragraph # This is the commit message empira#22: barcode rendering fix # This is the commit message empira#23: amend README.md # This is the commit message empira#24: fix readme # This is the commit message empira#25: fix readme

TureeZhang · 2021-11-30T06:22:12Z

Thanks, spend so much time to looking for the reason, you are right, finally got your answer.

Then I also find a Stackoverflow answer says:

Text will break at spaces, at hyphens, at soft hyphens, and at zero-width non-joiners.

If you do not care where the linebreak occurs, just insert a zero-width non-joiner between each pair of characters or at suitable locations (e.g. for URLs after each slash or dot).

Soft hyphens between syllables will look better for human-readable text.

---User: I liked the old Stack Overflow

So I copy the ZeroWidth NonJoiner from here : https://unicode-explorer.com/c/200C

then write these simple code :

private readonly char _zeroWidthNonJoiner = '‌'; // ATTENSION: a Zero Width NonJoiner in this var
public string ZeroWidthNonJoinerString(string chinese)
{
    if (string.IsNullOrEmpty(chinese))
        return chinese;

    StringBuilder @string = new StringBuilder();
    foreach (char item in chinese)
    {
        @string.Append(item);
        @string.Append(this._zeroWidthNonJoiner);
    }

    return @string.ToString();
}

use the method return string then give it to Migradoc then line breaks works right:

iDreamCXK · 2021-11-30T09:59:43Z

You are so Handsome~

Erhushenshou · 2022-10-24T15:36:23Z

why I use version 1.50.5147 and render the font to be "Arial Unicode MS" still can't print Chinese?

phoebusryan · 2023-08-22T13:51:07Z

@iDreamCXK Which Version of Migradoc and font etc do you use to successfully draw chinese characters? I use the font "Arial Unicode MS" on a windows 10 machine and only get squares.. :/

I use the nuget-package "PDFsharp-MigraDoc-wpf" version="1.50.5147"

iDreamCXK · 2023-08-22T14:08:35Z

You can you Microsoft YaHei.It's support of Chinese

phoebusryan · 2023-08-23T08:41:28Z

@iDreamCXK When I use "Microsoft YaHei" my code throws an Exception in "RenderDocument()":
"TrueType collection fonts are not yet supported by PDFsharp"

iDreamCXK · 2023-08-23T09:01:28Z

Maybe you can get help from here
https://stackoverflow.com/questions/52791080/migradoc-pdfsharp-throwing-exceptions-with-chinese-yahei-font

ThomasHoevel · 2023-08-23T09:07:28Z

Find a True Type Font (.ttf), not a True Type Collection (.ttc).

phoebusryan · 2023-08-23T09:13:32Z

@ThomasHoevel That's tricky on a windows 10 machine. I just found Microsoft Yahei as a ttf but can't use it on Windows 10 because I already have a TTC with the same name installed. I can't uninstall the preinstalled version because it's a system font.. bah.

So i need to rename this font (don't know how) or find a different font with chinese characters....

iDreamCXK · 2023-08-23T09:16:56Z

Maybe you can use 宋体在 2023年8月23日，17:13，Thomas Kaegi ***@***.***> 写道： @ThomasHoevel That's tricky on a windows 10 machine. I just found Microsoft Yahei as a ttf but can't use it on Windows 10 because I already have a TTC with the same name installed. I can't uninstall the preinstalled version because it's a system font.. bah. So i need to rename this font (don't know how) or find a different font with chinese characters.... —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

phoebusryan · 2023-08-25T09:54:59Z

I used the font "Noto Sans CJK SC" and it worked... damn... thanks guys!

ThomasHoevel closed this as completed Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document does not have any linebreak for Chinese language #16

Document does not have any linebreak for Chinese language #16

iDreamCXK commented Dec 25, 2018

TH-Soft commented Dec 25, 2018

iDreamCXK commented Dec 27, 2018

TureeZhang commented Nov 30, 2021

iDreamCXK commented Nov 30, 2021

Erhushenshou commented Oct 24, 2022

phoebusryan commented Aug 22, 2023

iDreamCXK commented Aug 22, 2023

phoebusryan commented Aug 23, 2023

iDreamCXK commented Aug 23, 2023

ThomasHoevel commented Aug 23, 2023

phoebusryan commented Aug 23, 2023

iDreamCXK commented Aug 23, 2023 via email

phoebusryan commented Aug 25, 2023

Document does not have any linebreak for Chinese language #16

Document does not have any linebreak for Chinese language #16

Comments

iDreamCXK commented Dec 25, 2018

TH-Soft commented Dec 25, 2018

iDreamCXK commented Dec 27, 2018

TureeZhang commented Nov 30, 2021

iDreamCXK commented Nov 30, 2021

Erhushenshou commented Oct 24, 2022

phoebusryan commented Aug 22, 2023

iDreamCXK commented Aug 22, 2023

phoebusryan commented Aug 23, 2023

iDreamCXK commented Aug 23, 2023

ThomasHoevel commented Aug 23, 2023

phoebusryan commented Aug 23, 2023

iDreamCXK commented Aug 23, 2023 via email

phoebusryan commented Aug 25, 2023