Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I think that it must be replaced spaces without the Unicode option when outputting Excel. #1284

Closed
tommykoko opened this issue Dec 11, 2019 · 2 comments · Fixed by #4106
Closed

Comments

@tommykoko
Copy link

This is:

- [X] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

The processDomElement method dose not replace IDEOGRAPHIC SPACE(S) to a half-width space.

What is the current behavior?

The processDomElement method replaces IDEOGRAPHIC SPACES to a half-width space.

What are the steps to reproduce?

Outputting Excel with the text which has IDEOGRAPHIC SPACE(S) like below.

 ★
  ★

The processDomElement method in PhpOffice\PhpSpreadsheet\Reader\Html treats and replaces IDEOGRAPHIC SPACE (Full-width space) as Unicode to a half-width space like below.
So IDEOGRAPHIC SPACES is replaced to a half-width space.

CODE as is

$domText = preg_replace('/\s+/u', ' ', trim($child->nodeValue));

HTML does not replace IDEOGRAPHIC SPACES to a half-width space.

And many Japanese people use IDEOGRAPHIC SPACES as a formating text.
So please replace spaces without the Unicode Option when outputting Excel.

CODE to be

$domText = preg_replace('/\s+/', ' ', trim($child->nodeValue));

Which versions of PhpSpreadsheet and PHP are affected?

Ver 1.10.1

Best regards.

@stale
Copy link

stale bot commented Feb 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If this is still an issue for you, please try to help by debugging it further and sharing your results.
Thank you for your contributions.

@stale stale bot added the stale label Feb 9, 2020
@stale stale bot closed this as completed Feb 17, 2020
@oleibman
Copy link
Collaborator

This treatment dates back to PhpExcel. Like @tommykoko, I believe it is incorrect - Html's handling of whitespace corresponds to Php's when the u modifier is not specified, and it should be removed from Html Reader. This affects not only ideographic space but non-breaking space (and several other characters which I'm guessing aren't much used). I am reopening the issue. Expect a fix in a day or two.

@oleibman oleibman reopened this Jul 21, 2024
@stale stale bot removed the stale label Jul 21, 2024
oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Jul 22, 2024
Fix PHPOffice#1284, which was closed as stale in 2019, but which I will now reopen. Html Reader converts *Unicode* whitespace characters in a DOM text node to space. However, Html treats only space, tab, CR, LF, vertical tab, and form-feed as whitespace. Using a regular expression with the `u` (Unicode) modifier causes a number of other characters to be converted to space inappropriately. The issue mentions "ideographic space" in particular, stating that it is used for formatting and should be preserved. "Non-breaking space" is also used in the same way and should also be preserved. An exception is made for a text node consisting of a single non-breaking space, since that is used as a placeholder by Html Writer; my own guess is that this is the reason why the Unicode modifier was used in the first place.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants