Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

listWorksheetInfo() not working without load() for Xls reader with BIFF7 file #3671

Closed
ohaag opened this issue Aug 9, 2023 · 1 comment · Fixed by #3672
Closed

listWorksheetInfo() not working without load() for Xls reader with BIFF7 file #3671

ohaag opened this issue Aug 9, 2023 · 1 comment · Fixed by #3672

Comments

@ohaag
Copy link

ohaag commented Aug 9, 2023

This is:

- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

When using Reader\Xls->listWorksheetInfo($path), the reader should initialize itself enough to succeed this feature.

What is the current behavior?

Argument 3 passed to PhpOffice\PhpSpreadsheet\Shared\StringHelper::convertEncoding() must be of the type string, null given,
called in /.../phpspreadsheet/src/PhpSpreadsheet/Reader/Xls.php on line 7796 {"exception":"[object] (Symfony\\Component\\Debug\\Exception\\FatalThrowableError(code: 0): Type error: Argument 3 passed to PhpOffice\\PhpSpreadsheet\\Shared\\StringHelper::convertEncoding() must be of the type string, null given, 
called in /.../phpspreadsheet/src/PhpSpreadsheet/Reader/Xls.php on line 7796 at /.../phpspreadsheet/src/PhpSpreadsheet/Shared/StringHelper.php:437)"} []

With a BIFF7 file, when using Reader\Xls->listWorksheetInfo($path), codepage is not initialized to its default value, so StringHelper::convertEncoding crash because 3rd argument is null.

What are the steps to reproduce?

I don't know how you would reproduce that without external file but I'll detail what I know about the files.
I noticed that the bug wasn't occuring if I reopen the .xls file and save it with LibreOffice (I don't have excel at reach).
So let's call File A the original one, and File B the A one but opened and saved with LibreOffice.

$reader = new \PhpOffice\PhpSpreadsheet\Reader\Xls();
$reader->listWorksheetInfo($fileA); // crashes
$reader->listWorksheetInfo($fileB); // works

What I noticed is that $this->codepage is only initialized in the loadSpreadsheetFromFile method.

But this method needs the BaseReader::load method to be called.
This should not be the case like with any other format.

By calling the load method before listWorksheetInfo, it fixes the issue and I then could see the differences between file A and file B.

File A was BIFF7 and File B was BIFF8 (which does not need the codepage attribute).
Therefore, File A code page is "MAC" and File B code page is "UTF-16LE".
I don't know how to generate a file A alike, but it probably have been generated from excel on Mac.

Instead of calling the whole load method, my current workaround is just setting the codepage value, but of course we should not have to do that.

if ($objReader instanceof Xls) {
    $codepageProperty = (new \ReflectionClass($objReader))->getProperty('codepage');
    $codepageProperty->setAccessible(true);
    if ($codepageProperty->getValue($objReader) === null) {
        $codepageProperty->setValue($objReader, CodePage::DEFAULT_CODE_PAGE);
    }
}

What features do you think are causing the issue

  • Reader

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

Xls BIFF7 (at least), I don't know if other format could be impacted

Which versions of PhpSpreadsheet and PHP are affected?

1.25.2 and 1.29.0

@oleibman
Copy link
Collaborator

oleibman commented Aug 9, 2023

Confirmed; I can duplicate this problem with samples/templates/30templatebiff5.xls.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Aug 10, 2023
Fix PHPOffice#3671. Xls reader was not processing Code Page as part of functions ListWorksheetInfo/Names, which was causing them to fail for for BIFF5 (and BIFF7); this was not a problem for BIFF8. There were no unit tests for these functions for either BIFF5 or BIFF8. There are now.
oleibman added a commit that referenced this issue Aug 17, 2023
* Read Code Page for Xls ListWorksheetInfo/Names for BIFF5

Fix #3671. Xls reader was not processing Code Page as part of functions ListWorksheetInfo/Names, which was causing them to fail for for BIFF5 (and BIFF7); this was not a problem for BIFF8. There were no unit tests for these functions for either BIFF5 or BIFF8. There are now.

* Add getVersion and getCodePage Methods

These came about because test file for non-standard codepage was supposed to be BIFF5, but turned out to be BIFF8 using UTF-16 with some string data otherwise encoded. Add a BIFF5 equivalent (some hex editing was required), and the means to distinguish one from the other.

* Found MACCENTRALEUROPE Text in BIFF8

It was used for 'Last Modified By' property, even though bulk of spreadsheet uses UTF-16LE. Add a test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants