Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document does not have any linebreak for Chinese language #16

Closed
iDreamCXK opened this issue Dec 25, 2018 · 13 comments
Closed

Document does not have any linebreak for Chinese language #16

iDreamCXK opened this issue Dec 25, 2018 · 13 comments

Comments

@iDreamCXK
Copy link

MigraDoc does not add any linebreak for Chinese language,as shown in the figure.
However,the English paragraph can be dealt correctly.
image

After debugging,I have found something wrong with the method "internal override void VisitDocumentObjectCollection(DocumentObjectCollection elements)" in the project MigraDoc.DocumentObjectModel"

internal override void VisitDocumentObjectCollection(DocumentObjectCollection elements)
{
        List<int> textIndices = new List<int>();
        if (elements is ParagraphElements)
        {
               for (int idx = 0; idx < elements.Count; ++idx)
              {
                    if (elements[idx] is Text)
                        textIndices.Add(idx);
              }
        }

     int[] indices = (int[])textIndices.ToArray();
     if (indices != null)
     {
          int insertedObjects = 0;
          foreach (int idx in indices)
          {
              Text text = (Text)elements[idx + insertedObjects];
              string currentString = "";
              foreach (char ch in text.Content)
              {
                   // TODO Add support for other breaking spaces (en space, em space, &c.).
                   switch (ch)
                   {
                       case ' ':
                       case '\r':
                       case '\n':
                       case '\t':
                       if (currentString != "")
                       {
                           elements.InsertObject(idx + insertedObjects, new Text(currentString));
                           ++insertedObjects;
                           currentString = "";
                        }
                       elements.InsertObject(idx + insertedObjects, new Text(" "));
                       ++insertedObjects;
                       break;
		       ...

The method replaces every blank for a DocumentObject ,and then MigraDoc can deal with every English word correctly ! Howevery ,thers is no blank for Chinese paragraph , such as "圣诞节(Christmas)又称耶诞节、耶稣诞辰,译名为“基督弥撒”,是西方传统节日,起源于基督教,在每年公历12月25日。弥撒是教会的一种礼拜仪式。圣诞节是一个宗教节,因为把它当作耶稣的诞辰来庆祝,故名“耶诞节”。"

I add a patch for my source code , if the text contains a Chines word , the method will replace it with a DocumentObject ,as shown below.

         foreach (char ch in text.Content)
         {
		if ( ch >= 0x4e00 && ch <= 0x9fbb )
		{
			if (currentString != "")
			{
				elements.InsertObject(idx + insertedObjects, new Text(currentString));
				++insertedObjects;
				currentString = "";
			}
			elements.InsertObject(idx + insertedObjects, new Text(ch.ToString()));**
			++insertedObjects;**
			continue;
		}

                // TODO Add support for other breaking spaces (en space, em space, &c.).
                switch (ch)
                 {
                      case ' ':
                      case '\r':
                      case '\n':
                      case '\t':
                      .....

And the MigraDoc performs well
image

However , I think the way to solve the problem is Low-Performance and not thorough. Other languages ,like Japanese,will also meet the same problem.Does the author have considered this problem ? The document do not have any way to resolve the problem.

@TH-Soft
Copy link
Contributor

TH-Soft commented Dec 25, 2018

Does the author have considered this problem?

The authors know very little about CJK and RTL, so they are not surprised about issues with non-Latin non-LTR languages. They even wrote that in the FAQ.

@iDreamCXK
Copy link
Author

Thank you for your reminding , I hope the issues showed above will be helpful for authors , and resovle it further .

emazv72 added a commit to emazv72/MigraDoc that referenced this issue Aug 26, 2020
# This is the 1st commit message:
Introduce xml format for serialization and deserialzion

# This is the commit message empira#2:

Update README.md
# This is the commit message empira#3:

Comment fix for release build

# This is the commit message empira#4:

xml parser fix

# This is the commit message empira#5:

last page header and footer

# This is the commit message empira#6:

footer fix

# This is the commit message empira#7:

upgrade README.md

# This is the commit message empira#8:

Update README.md
# This is the commit message empira#9:

last page header footer rendering fix

# This is the commit message empira#10:

ListInfo fix serialization

# This is the commit message empira#11:

paragraph parsing fix

# This is the commit message empira#12:

text parser fix

# This is the commit message empira#13:

support para break

# This is the commit message empira#14:

CDATA parser fix

# This is the commit message empira#15:

rounded corner radius for table cells

# This is the commit message empira#16:

makes DifferentLastPageHeaderFooter  section aware

# This is the commit message empira#17:

fix cell serialization

# This is the commit message empira#18:

Avoid rendering failure images

# This is the commit message empira#19:

barcode rendering

# This is the commit message empira#20:

doc fix

# This is the commit message empira#21:

support barcode elements inside a paragraph

# This is the commit message empira#22:

barcode rendering fix

# This is the commit message empira#23:

amend README.md

# This is the commit message empira#24:

fix readme

# This is the commit message empira#25:

fix readme
@TureeZhang
Copy link

image

Thanks, spend so much time to looking for the reason, you are right, finally got your answer.

Then I also find a Stackoverflow answer says:

Text will break at spaces, at hyphens, at soft hyphens, and at zero-width non-joiners.

If you do not care where the linebreak occurs, just insert a zero-width non-joiner between each pair of characters or at suitable locations (e.g. for URLs after each slash or dot).

Soft hyphens between syllables will look better for human-readable text.

---User: I liked the old Stack Overflow

So I copy the ZeroWidth NonJoiner from here : https://unicode-explorer.com/c/200C

then write these simple code :

private readonly char _zeroWidthNonJoiner = '‌'; // ATTENSION: a Zero Width NonJoiner in this var
public string ZeroWidthNonJoinerString(string chinese)
{
    if (string.IsNullOrEmpty(chinese))
        return chinese;

    StringBuilder @string = new StringBuilder();
    foreach (char item in chinese)
    {
        @string.Append(item);
        @string.Append(this._zeroWidthNonJoiner);
    }

    return @string.ToString();
}

use the method return string then give it to Migradoc then line breaks works right:

image

@iDreamCXK
Copy link
Author

You are so Handsome~

@Erhushenshou
Copy link

why I use version 1.50.5147 and render the font to be "Arial Unicode MS" still can't print Chinese?

@phoebusryan
Copy link

@iDreamCXK Which Version of Migradoc and font etc do you use to successfully draw chinese characters? I use the font "Arial Unicode MS" on a windows 10 machine and only get squares.. :/

I use the nuget-package "PDFsharp-MigraDoc-wpf" version="1.50.5147"

@iDreamCXK
Copy link
Author

You can you Microsoft YaHei.It's support of Chinese

@phoebusryan
Copy link

@iDreamCXK When I use "Microsoft YaHei" my code throws an Exception in "RenderDocument()":
"TrueType collection fonts are not yet supported by PDFsharp"

@iDreamCXK
Copy link
Author

@ThomasHoevel
Copy link
Member

Find a True Type Font (.ttf), not a True Type Collection (.ttc).

@phoebusryan
Copy link

@ThomasHoevel That's tricky on a windows 10 machine. I just found Microsoft Yahei as a ttf but can't use it on Windows 10 because I already have a TTC with the same name installed. I can't uninstall the preinstalled version because it's a system font.. bah.

So i need to rename this font (don't know how) or find a different font with chinese characters....

@iDreamCXK
Copy link
Author

iDreamCXK commented Aug 23, 2023 via email

@phoebusryan
Copy link

I used the font "Noto Sans CJK SC" and it worked... damn... thanks guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants