Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Problems #212

Closed
Eternal-Infinity opened this issue Apr 1, 2015 · 26 comments
Closed

UTF-8 Problems #212

Eternal-Infinity opened this issue Apr 1, 2015 · 26 comments

Comments

@Eternal-Infinity
Copy link
Contributor

Hi there,

the company I work in tries to use PHPDOX in Jenkins as stated here:
Template for Jenkins Jobs for PHP Projects

However, we get the followin error:

[exec] [26.03.2015 - 21:35:50] - PATH_TO_FILE1.php (Encoding error - conversion to UTF-8 failed)
[exec] [26.03.2015 - 21:35:50] - PATH_TO_FILE2.php (Encoding error - conversion to UTF-8 failed)
[exec] [26.03.2015 - 21:35:50] Saving results to directory 'build/phpdox'
[exec]
[exec]
[exec] Oups... phpDox encountered a problem and has terminated!
[exec]
[exec] It most likely means you've found a bug, so please file a report for this
[exec] and paste the following details and the stacktrace (if given) along:
[exec]
[exec] PHP Version: 5.4.35 (WINNT)
[exec] PHPDox Version: 0.7.0
[exec] Exception: TheSeer\fDOM\fDOMException (Code: 3)
[exec] Location: phar://D:/phpdox.phar/fDOMDocument-1.5.0/TheSeer/fDOMDocument/fDOMDocument.php (Line 234)
[exec]
[exec] saving xml file failed
[exec]
[exec] [XML-STRING] [Line: 0 - Column: 0] Fatal Error 6003: output conversion failed due to conv error, bytes 0xFC 0x63 0x6B 0x74
[exec] [XML-STRING] [Line: 0 - Column: 0] Error 1544: encoder error
[exec]
[exec]
[exec] #0 phar://D:/phpdox.phar/phpdox/Application.php(138): TheSeer\phpDox\Collector\Project->save()
[exec] #1 phar://D:/phpdox.phar/phpdox/CLI.php(148): TheSeer\phpDox\Application->runCollector()
[exec] #2 D:\PHPtools\phpdox.phar(460): TheSeer\phpDox\CLI->run()
[exec]
[exec]
[exec]
[exec] Result: 1

I understand that this is because of the Umlauts we use in the code.
But is this a problem with PHPDOX or is the problem our code?

I have the same problem with another tool, so maybe the problem is with our code.
pdepend/pdepend#195

@theseer
Copy link
Owner

theseer commented Apr 2, 2015

Can you provide me with potentially stripped down versions of the files causing phpDox to choke?
If you do not want to publicly post them, feel free to drop me an email.

@theseer
Copy link
Owner

theseer commented Apr 2, 2015

Also, can you without too much of an effort verify if the problem still exists with the current master?

@Eternal-Infinity
Copy link
Contributor Author

Sorry for the late reply, I was on holiday.

But I found out what the problem is.
Special characters like "€" are not converted into UTF-8.
We have such characters in our source-code (and not just "€", but also other characters).
So I don't know if this is acctually an issue for PHPDOX or not...

However, the output "(Encoding error - conversion to UTF-8 failed)" alone made it quite difficult to see WHERE the acctual problem is.
You know what file, but not which line.
In my opinion it would be nice to see which line caused the problem.
Do you agree?

@theseer
Copy link
Owner

theseer commented Apr 13, 2015

I do agree but I don't have that information at crash time. The error message is produced by the DOM to XML String serializer and on save. Since that is PHP internal code and DOM doesn't really have a concept of line numbers (before serializing it to a string), there is not really anything useful to put out.

The shouldn't be problematic though. I'll verify and report back.

@theseer
Copy link
Owner

theseer commented Apr 13, 2015

Okay, disregard the comment regarding DOM. Didn't see the error message above. I'll see what i can do regarding the CONV error.

@Eternal-Infinity
Copy link
Contributor Author

Thank you for the fast reply.

Maybe it is only a local problem, but it really is the "€" for me.
After deleting the "€"-character in that specific file, phpdox didn't crash on that file (but on other files, for there are other characters that seem to lead to problems).
I am curious if you can verify if it is a problem, or if it is a local problem for me.

theseer added a commit that referenced this issue Apr 13, 2015
@theseer
Copy link
Owner

theseer commented Apr 13, 2015

I can't reproduce this on my system - neither with 0.7.0 nor with my current master.

@theseer
Copy link
Owner

theseer commented Apr 13, 2015

Can you mail me a test.php file that causes this crash?

@Eternal-Infinity
Copy link
Contributor Author

Sure, here a short file that causes the crash for me:

<?
class Test{
    //This function contains special characters like € or • 
    public function TestFunction(){
        $teststring = "phpdox costs 0€ • yay •";
    }
}
?>

Not only the comment causes the crash, the acctual string causes the crash as well.
I use phpDox 0.7.0 (didn't try it with current master so far) on Windows 7 professional SP1 64bit

@theseer
Copy link
Owner

theseer commented Apr 14, 2015

Can you please add the file via an PR to the testcase issue212/src folder (without using short opening syntax please) as copy and pasting it from here doesn't reproduce the error for me (on linux).

@Eternal-Infinity
Copy link
Contributor Author

Finally added the testfile in PR #217

theseer added a commit that referenced this issue Apr 20, 2015
@theseer
Copy link
Owner

theseer commented Apr 21, 2015

Merged. But it does not reproduce the crash on linux. I'll see if i can get it to crash on windows 7. Can you verify that the current master crashes on your maschine with this?

theseer added a commit that referenced this issue Apr 21, 2015
@theseer
Copy link
Owner

theseer commented Apr 21, 2015

Works for me:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\TheSeer\Desktop\phpdox\tests\data\issue212>c:\php\php.exe ..\..\..\phpdox.phar -f ./test.xml
phpDox 0.7.0-126-g7feacfa - Copyright (C) 2010 - 2015 by Arne Blankerts

[21.04.2015 - 03:54:56] Using config file './test.xml'
[21.04.2015 - 03:54:56] Registered collector backend 'parser'
[21.04.2015 - 03:54:56] Registered enricher 'build'
[21.04.2015 - 03:54:56] Registered enricher 'git'
[21.04.2015 - 03:54:56] Registered enricher 'checkstyle'
[21.04.2015 - 03:54:56] Registered enricher 'phpcs'
[21.04.2015 - 03:54:56] Registered enricher 'pmd'
[21.04.2015 - 03:54:56] Registered enricher 'phpunit'
[21.04.2015 - 03:54:56] Registered enricher 'phploc'
[21.04.2015 - 03:54:56] Registered output engine 'xml'
[21.04.2015 - 03:54:56] Registered output engine 'html'
[21.04.2015 - 03:54:56] Starting to process project 'phpDox-issue212'
[21.04.2015 - 03:54:56] Starting collector
[21.04.2015 - 03:54:56] Scanning directory 'C:/Users/TheSeer/Desktop/phpdox/tests/data/issue212/src' for files to process

...                                                     [3]

[21.04.2015 - 03:54:56] Saving results to directory 'C:/Users/TheSeer/Desktop/phpdox/tests/data/issue212/xml'
[21.04.2015 - 03:54:56] Resolving inheritance

...                                                     [3]

[21.04.2015 - 03:54:56] Collector process completed

[21.04.2015 - 03:54:56] Starting generator
[21.04.2015 - 03:54:56] Loading enrichers
[21.04.2015 - 03:54:56] Enricher Build Information initialized successfully
[21.04.2015 - 03:54:56] Starting event loop.

...........................                             [27]

[21.04.2015 - 03:54:57] Generator process completed
[21.04.2015 - 03:54:57] Processing project 'phpDox-issue212' completed.


Time: 297 ms, Memory: 5.00Mb

@Eternal-Infinity
Copy link
Contributor Author

That is strange...
I still get the error with that file when I use the current .phar.
Will check the current master next and then report here.

@theseer
Copy link
Owner

theseer commented Apr 23, 2015

Any updates? This is one of the last tickets blocking the 0.8 release.

@Eternal-Infinity
Copy link
Contributor Author

Sorry, will look into this matter on the weekend, probably on saturday

@Eternal-Infinity
Copy link
Contributor Author

Sorry that I kept you waiting for so long.
I can confirm that the problem exists with the current master as well.
Also I get the problem on two different desktop PCs, one on the PC at the office where I work and one at my home.
Both have Windows 7 SP1 64bit and both get the problems.
As you can see, I even have the exact same Windows-Version running as you:

C:\web\gitHub\phpdox>ver

Microsoft Windows [Version 6.1.7601]

C:\web\gitHub\phpdox>phpdox
phpDox 0.8.0-dev - Copyright (C) 2010 - 2015 by Arne Blankerts

[25.04.2015 - 19:35:23] Using config file './phpdox.xml'
[25.04.2015 - 19:35:23] Registered collector backend 'parser'
[25.04.2015 - 19:35:23] Registered enricher 'build'
[25.04.2015 - 19:35:23] Registered enricher 'git'
[25.04.2015 - 19:35:23] Registered enricher 'checkstyle'
[25.04.2015 - 19:35:23] Registered enricher 'phpcs'
[25.04.2015 - 19:35:23] Registered enricher 'pmd'
[25.04.2015 - 19:35:23] Registered enricher 'phpunit'
[25.04.2015 - 19:35:23] Registered enricher 'phploc'
[25.04.2015 - 19:35:23] Registered output engine 'xml'
[25.04.2015 - 19:35:23] Registered output engine 'html'
[25.04.2015 - 19:35:23] Starting to process project 'phpdox'
[25.04.2015 - 19:35:23] Starting collector
[25.04.2015 - 19:35:23] Scanning directory 'C:/web/gitHub/phpdox/tests/data/issue212' for files to process

fcc                                                     [3]

[25.04.2015 - 19:35:23] The following file(s) had errors during processing and were excluded:
[25.04.2015 - 19:35:23]  - C:/web/gitHub/phpdox/tests/data/issue212/Eternal_Infinity/special_chars.php (Encoding error - conversion to UTF-8 failed
[25.04.2015 - 19:35:23] Saving results to directory 'C:/web/gitHub/phpdox/build/phpdox/xml'
[25.04.2015 - 19:35:23] Collector process completed

[25.04.2015 - 19:35:23] Starting generator
[25.04.2015 - 19:35:23] Loading enrichers
[25.04.2015 - 19:35:23] Enricher Build Information initialized successfully
[25.04.2015 - 19:35:23] Starting event loop.

......................                                  [22]

[25.04.2015 - 19:35:23] Generator process completed
[25.04.2015 - 19:35:23] Processing project 'phpdox' completed.


Time: 200 ms, Memory: 3.25Mb

@Eternal-Infinity
Copy link
Contributor Author

But it seems that at least there isn't an error on saving the XML.
The phpdox.phar had an error saving the xml, whereas the current master doesn't seem to have that problem.
There is still this conversion error on the special chars...

@redbeardcreator
Copy link
Contributor

I don't know if it helps, but I had a similar error. It turns out I had characters from the standard Windows code page. When attempting to read the file as UTF-8 (or converting it), the process complained. I found those characters and replaced them. In my case I could use HTML entities. You might be able to do the same.

@Eternal-Infinity
Copy link
Contributor Author

The problematic characters for me are
• (\u2022)
€ (\u20AC)
I will look if I can replace those characters with HTML entities in the code here.
Still I wonder why phpdox can't convert at least the €-sign.
It might really be a Windows-problem, because theseer can't reproduce this error on Linux as stated here
On the other hand, theseer can't reproduce it on Windows 7 either.... (see here )
Hmmm.......

@theseer
Copy link
Owner

theseer commented Apr 27, 2015

Okay, managed to finally reproduce the problem after manually converting the sample code file into Windows-1252 encoding (thanks to @redbeardcreator for that pointer ;) ):

phpDox 0.7.0-133-g83c76b6 - Copyright (C) 2010 - 2015 by Arne Blankerts

[27.04.2015 - 22:35:35] Using config file './test.xml'
[27.04.2015 - 22:35:35] Registered collector backend 'parser'
[27.04.2015 - 22:35:35] Registered enricher 'build'
[27.04.2015 - 22:35:35] Registered enricher 'git'
[27.04.2015 - 22:35:35] Registered enricher 'checkstyle'
[27.04.2015 - 22:35:35] Registered enricher 'phpcs'
[27.04.2015 - 22:35:35] Registered enricher 'pmd'
[27.04.2015 - 22:35:35] Registered enricher 'phpunit'
[27.04.2015 - 22:35:35] Registered enricher 'phploc'
[27.04.2015 - 22:35:35] Registered output engine 'xml'
[27.04.2015 - 22:35:35] Registered output engine 'html'
[27.04.2015 - 22:35:35] Starting to process project 'phpDox-issue212'
[27.04.2015 - 22:35:35] Starting collector
[27.04.2015 - 22:35:35] Scanning directory '/home/theseer/storage/php/phpdox/tests/data/issue212/src' for files to process

...ff                                               [5]

[27.04.2015 - 22:35:35] The following file(s) had errors during processing and were excluded:
[27.04.2015 - 22:35:35]  - /home/theseer/storage/php/phpdox/tests/data/issue212/src/win-euro.php (Encoding error - conversion to UTF-8 failed)
[27.04.2015 - 22:35:35]  - /home/theseer/storage/php/phpdox/tests/data/issue212/src/win-special.php (Encoding error - conversion to UTF-8 failed)
[27.04.2015 - 22:35:35] Saving results to directory '/home/theseer/storage/php/phpdox/tests/data/issue212/xml'
[27.04.2015 - 22:35:35] Resolving inheritance

...                                                 [3]

[27.04.2015 - 22:35:35] Collector process completed

[27.04.2015 - 22:35:35] Starting generator
[27.04.2015 - 22:35:35] Loading enrichers
[27.04.2015 - 22:35:35] Enricher Build Information initialized successfully
[27.04.2015 - 22:35:35] Starting event loop.

...........................                         [27]

[27.04.2015 - 22:35:35] Generator process completed
[27.04.2015 - 22:35:35] Processing project 'phpDox-issue212' completed.


Time: 125 ms, Memory: 5.50Mb

The manually converted file is detected as "unknown-8bit" by phpdox (using finfo->file(..., FILEINFO_MIME_ENCODING)), which is of course not a usable source encoding and thus the conversion fails.

I'll see what I can do to address this situation. Since there is of course no really reliable way to detect the text encoding it probably means I have to provide a configuration option for it.

theseer added a commit that referenced this issue Apr 27, 2015
@Eternal-Infinity
Copy link
Contributor Author

Hello again,

the fix for this one (the new added parameter "encoding") is not in the current latest .phar-release 0.8.0, right?
Will it be present in the next release?

@theseer
Copy link
Owner

theseer commented May 12, 2015

I merely forgot to add this ticket to the release notes but the modified code is included in the 0.8.0 release. Are you still having issues with the new option set?

@Eternal-Infinity
Copy link
Contributor Author

Ah, I see
Indeed I still have the problem.
My phpdox.xml looks like this:

<?xml version="1.0" encoding="utf-8"?>
<phpdox xmlns="http://xml.phpdox.net/config">
 <project name="Documentation by phpdox" source="${basedir}/../web" workdir="build/phpdox">
  <collector publiconly="false" backend="parser" encoding="WINDOWS-1252">
   <include mask="*.php" />
  </collector>

  <generator output="build">
   <build engine="html" enabled="true" output="api">
    <file extension="html" />
   </build>
  </generator>
 </project>
</phpdox>

Is there something wrong with it?
I still get the errors, BUT phpdox doesn't end with a failure because of that anymore.

Yet now there is a new problem related to this:
There is a file in our application that contains a umlaut (ü) in the name of the file (yeah, I know....).
This causes a crash of phpdox again.
(Which didn't happen before.....or maybe before, the crash happend earlier so phpdox didn't even come to that file and terminated before)
I excluded the folder with this file for now, so phpdox works fine for me for now.
And even though no one should ever use umlauts in the filename or the classname (it is code from my precursor here) maybe you could still look into this case, just in case.

The phpdox-output related to this:

[exec] Oups... phpDox encountered a problem and has terminated!
[exec]
[exec] It most likely means you've found a bug, so please file a report for this
[exec] and paste the following details and the stacktrace (if given) along:
[exec]
[exec] PHP Version: 5.4.5 (WINNT)
[exec] PHPDox Version: 0.8.0
[exec] Exception: TheSeer\phpDox\Collector\ProjectException (Code: 4)
[exec] Location: phar://C:/Program Files/PHPTools/phpdox.phar/phpdox/collector/project/Project.php (Line 256)
[exec]
[exec] An error occured while saving the collected data: Internal Error: Unit 'R├╝cktritt' could not be saved (ns: , n: R├╝cktritt).
[exec]
[exec] #0 phar://C:/Program Files/PHPTools/phpdox.phar/phpdox/CLI.php(161): TheSeer\phpDox\Application->runCollector()
[exec] #1 C:\Program Files\PHPTools\phpdox.phar(450): TheSeer\phpDox\CLI->run()
[exec]
[exec]
[exec] Exception: TheSeer\phpDox\Collector\ProjectException (Code: 2)
[exec] Location: phar://C:/Program Files/PHPTools/phpdox.phar/phpdox/collector/project/Project.php (Line 310)
[exec]
[exec] Internal Error: Unit 'R├╝cktritt' could not be saved (ns: , n: R├╝cktritt).
[exec]
[exec] #0 phar://C:/Program Files/PHPTools/phpdox.phar/phpdox/Application.php(141): TheSeer\phpDox\Collector\Project->save()
[exec] #1 phar://C:/Program Files/PHPTools/phpdox.phar/phpdox/CLI.php(161): TheSeer\phpDox\Application->runCollector()
[exec] #2 C:\Program Files\PHPTools\phpdox.phar(450): TheSeer\phpDox\CLI->run()
[exec]
[exec]
[exec] Exception: TheSeer\fDOM\fDOMException (Code: 3)
[exec] Location: phar://C:/Program Files/PHPTools/phpdox.phar/vendor/theseer/fdomdocument/src/fDOMDocument.php (Line 234)
[exec]
[exec] Saving XML to file 'build/phpdox/classes/R├╝cktritt.xml' failed
[exec]
[exec] [XML-STRING] [Line: 0 - Column: 0] Fatal Error 6003: output conversion failed due to conv error, bytes 0xFC 0x63 0x6B 0x74
[exec] [XML-STRING] [Line: 0 - Column: 0] Error 1544: encoder error
[exec]
[exec]
[exec] #0 phar://C:/Program Files/PHPTools/phpdox.phar/phpdox/collector/project/Project.php(243): TheSeer\phpDox\Collector\Project->saveUnit()
[exec] #1 phar://C:/Program Files/PHPTools/phpdox.phar/phpdox/Application.php(141): TheSeer\phpDox\Collector\Project->save()
[exec] #2 phar://C:/Program Files/PHPTools/phpdox.phar/phpdox/CLI.php(161): TheSeer\phpDox\Application->runCollector()
[exec] #3 C:\Program Files\PHPTools\phpdox.phar(450): TheSeer\phpDox\CLI->run()
[exec]
[exec]
[exec]
[exec] Result: 1

@theseer
Copy link
Owner

theseer commented May 12, 2015

The file itself was processed without a problem, it's the saving that now fails due to the filename and obvious encoding issues on that part.
This is a different problem though than this ticket is about.

Would you mind opening a new issue for the filename problem?

@Eternal-Infinity
Copy link
Contributor Author

Thank you for the answer
No problem, I will open a new ticket ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants