Skip to content

Commit

Permalink
WIP Refactor Xls Reader
Browse files Browse the repository at this point in the history
I have been having some time-out problems with php-cs-fixer in my environment (no problems *yet* in Github). Bringing that up with them, the first thing they suggested was that it might be due to very large modules. The largest module they use in testing is a bit over 1,000 lines, and we have about 16 that exceed that. The biggest of these is Xls Reader; at 7,647 lines, it is more than 2,000 lines longer than its nearest competitor (Calculation), and at least 5 times larger than fixer's max.

It's not clear to me that breaking it up will actually solve my problem. On the other hand, perhaps it is time to do some re-factoring anyhow; as an example, changing the parsing of tfunc/tfuncv values from enormous select statements to indexing constant arrays, possibly external, is something that ought to have happened long ago. This turned out to be easier than I had thought. Breaking it into sub-modules each of which can access each other's protected properties and methods was fairly straightforward. There's a bit of overhead in having to allocate new classes, but Xlsx Reader has been doing that all along (without the protected access part).

I've managed to remove about 2,900 lines from Reader/Xls, scattering those among 2 existing and 6 new source modules. I'm not sure I've chosen the best possible, or most maintainable, approach. I'm not sure that I'm done (there may be opportunities to move the parsing to its own module). But I do want something in place as a contingency. There's no need to rush it into production; I plan to leave this in draft status for a while, at least until after release 3.0.0.
  • Loading branch information
oleibman committed Jul 29, 2024
1 parent b406367 commit 3dfb92e
Show file tree
Hide file tree
Showing 9 changed files with 5,642 additions and 6,155 deletions.
9,431 changes: 3,278 additions & 6,153 deletions src/PhpSpreadsheet/Reader/Xls.php

Large diffs are not rendered by default.

69 changes: 69 additions & 0 deletions src/PhpSpreadsheet/Reader/Xls/Biff5.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
<?php

namespace PhpOffice\PhpSpreadsheet\Reader\Xls;

use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
use PhpOffice\PhpSpreadsheet\Reader\Xls;

class Biff5 extends Xls
{
/**
* Reads a cell range address in BIFF5 e.g. 'A2:B6' or 'A1'
* always fixed range
* section 2.5.14.
*/
public static function readBIFF5CellRangeAddressFixed(string $subData): string
{
// offset: 0; size: 2; index to first row
$fr = self::getUInt2d($subData, 0) + 1;

// offset: 2; size: 2; index to last row
$lr = self::getUInt2d($subData, 2) + 1;

// offset: 4; size: 1; index to first column
$fc = ord($subData[4]);

// offset: 5; size: 1; index to last column
$lc = ord($subData[5]);

// check values
if ($fr > $lr || $fc > $lc) {
throw new ReaderException('Not a cell range address');
}

// column index to letter
$fc = Coordinate::stringFromColumnIndex($fc + 1);
$lc = Coordinate::stringFromColumnIndex($lc + 1);

if ($fr == $lr && $fc == $lc) {
return "$fc$fr";
}

return "$fc$fr:$lc$lr";
}

/**
* Read BIFF5 cell range address list
* section 2.5.15.
*/
public static function readBIFF5CellRangeAddressList(string $subData): array
{
$cellRangeAddresses = [];

// offset: 0; size: 2; number of the following cell range addresses
$nm = self::getUInt2d($subData, 0);

$offset = 2;
// offset: 2; size: 6 * $nm; list of $nm (fixed) cell range addresses
for ($i = 0; $i < $nm; ++$i) {
$cellRangeAddresses[] = self::readBIFF5CellRangeAddressFixed(substr($subData, $offset, 6));
$offset += 6;
}

return [
'size' => 2 + 6 * $nm,
'cellRangeAddresses' => $cellRangeAddresses,
];
}
}
Loading

0 comments on commit 3dfb92e

Please sign in to comment.