Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insertNewRowBefore causes Relative Named Range miscalculation #3661

Closed
3 of 11 tasks
jackordman opened this issue Jul 31, 2023 · 5 comments · Fixed by #3673
Closed
3 of 11 tasks

insertNewRowBefore causes Relative Named Range miscalculation #3661

jackordman opened this issue Jul 31, 2023 · 5 comments · Fixed by #3673

Comments

@jackordman
Copy link

jackordman commented Jul 31, 2023

This is:

What is the expected behavior?

A relative named range, such as '=$A1' should always point to A, but relative to the row it is being used in.

What is the current behavior?

Inserting x rows makes existing relative ranges point to x rows below where they are being used.

What are the steps to reproduce?

<?php
require __DIR__ . '/vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\NamedRange;
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;

$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();

$spreadsheet->addNamedRange(new NamedRange('FIRST', $sheet, '=$A1'));
$spreadsheet->addNamedRange(new NamedRange('SECOND', $sheet, '=$B1'));
$spreadsheet->addNamedRange(new NamedRange('THIRD', $sheet, '=$C1'));

$data = [1,2,3,'=FIRST', '=SECOND', '=THIRD'];
$sheet->fromArray($data, null, 'A1');
$sheet->fromArray($data, null, 'A2');
$sheet->fromArray($data, null, 'A3');

$sheet->insertNewRowBefore(1, 4);

$writer = new Xlsx($spreadsheet);
$writer->save('example.xlsx');

What features do you think are causing the issue

  • Reader
  • Writer
  • Styles
  • Data Validations
  • Formula Calculations
  • Charts
  • AutoFilter
  • Form Elements

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

At least xlsx

Which versions of PhpSpreadsheet and PHP are affected?

PHP 8.2.8
PhpSpreadsheet 1.29.0

@jackordman
Copy link
Author

To elaborate on the issue in case I'm being unclear, inserting x rows should change row references by x rows, but it appears the number of rows is being double-counted.

@oleibman
Copy link
Collaborator

oleibman commented Aug 4, 2023

Confirmed. After the insert, PhpSpreadsheet adjusts the definition for FIRST to $A5; Excel leaves it as $A1. I will need to study the code in question; it might be tricky.

@oleibman
Copy link
Collaborator

oleibman commented Aug 4, 2023

Oh, this will get messy. Not necessarily for your case, where only a single cell is used in the definition. But, what of a case where there is a range and the "absoluteness" of the parts of the range don't match. You don't even need to insert rows to get in trouble here. Based loosely on ReferenceHelperTest::testInsertRowsWithDefinedNames (which probably does not match Excel's behavior), create a spreadsheet with 'Column1' in A1, then 2, 4, 6, 8, 10, 12, 14, 16 in the subsequent rows in column A. Then define name FIRSTCOLUMN as $A$2:$A6. Then set each of B2, B3, B4, and B5 to =SUM(FIRSTCOLUMN). Excel evaluates B2 as 20, which is the sum of A2:A5. B3 is 30 (A2:A6). B4 is 42 (A2:A7). B5 is 56 (A2:A8). PhpSpreadsheet calculates B2 as 42 (A2:A7), B3 as 56 (A2:A8), and B4 as 72 (A2:A9). I understand what PhpSpreadsheet is doing (goes from absolute address A2 to absolute column A row current row + 6 - 1). I don't understand what Excel is doing. @MarkBaker can you explain it? I can upload code and/or spreadsheet if my description isn't clear.

@jackordman
Copy link
Author

Thank you! I had a suspicion it could get messy, from taking a cursory glance at the code for re-calculating ranges.

For anyone else running into this, a simple workaround for now is to add named ranges after inserting rows, as seen below. Although this might not be feasible for every use case, especially more complex ones.

<?php
require __DIR__ . '/vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\NamedRange;
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;

$spreadsheet = new Spreadsheet();
$sheet = $spreadsheet->getActiveSheet();

$data = [1,2,3,'=FIRST', '=SECOND', '=THIRD'];
$sheet->fromArray($data, null, 'A1');
$sheet->fromArray($data, null, 'A2');
$sheet->fromArray($data, null, 'A3');

$sheet->insertNewRowBefore(1, 4);

$spreadsheet->addNamedRange(new NamedRange('FIRST', $sheet, '=$A1'));
$spreadsheet->addNamedRange(new NamedRange('SECOND', $sheet, '=$B1'));
$spreadsheet->addNamedRange(new NamedRange('THIRD', $sheet, '=$C1'));

$writer = new Xlsx($spreadsheet);
$writer->save('example.xlsx');

@oleibman
Copy link
Collaborator

oleibman commented Aug 5, 2023

Aha, a semblance of sense. See https://www.ablebits.com/office-addins-blog/excel-named-range/#absolute-relative-names. For relative addressing, the name is defined as relative to the "position of the active cell at the time the name is defined". For my spreadsheet, it is stored as $A$2:$A4; now the results I've seen make sense (perhaps my active cell was A3 rather than A1 when I defined the name), and I can take a stab at fixing the original problem. My working theory, at least to start, is that, for defined names, parts of the address which are absolute should be adjusted, but parts which are relative should not.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Aug 10, 2023
Fix PHPOffice#3661. Insertion or deletion of rows or columns can cause changes to the ranges for Defined Names. In fact, only the absolute parts of such ranges should be adjusted, while the relative parts should be left alone. Otherwise, as the original issue documents, the adjustment to the relative portion winds up being double-counted when the Defined Name is referenced in a formula. The major part of this change is to ReferenceHelper and CellReferenceHelper to not adjust relative addresses for Defined Names. An additional small change is needed in the Calculation engine to `recursiveCalculationCell` when a Defined Formula is being calculated.

In a sense, this is a breaking change, but for an obscure use case which (a) was wrong, and (b) is unlikely to be of importance. Some of the tests in ReferenceHelperTest were wrong and are now corrected, with the results being cross-checked against Excel.

When a Defined Name using relative addressing is defined in Excel, the result is treated as relative to the active cell on the sheet in which the name is defined. PhpSpreadsheet treats it as relative to cell A1. I think that is a reasonable treatment, and will not change its behavior to match Excel's - that would definitely be a breaking change of some consequence.

An interesting use of relative address in a defined name is demonstrated at https://excelguru.ca/always-refer-to-the-cell-above/. Note that the steps there involve setting the selected cell to A2 before defining the name. When that spreadsheet is stored, the actual definition of the range is `A1048576`. Likewise, adding a defined name for the cell to the left would be stored as `XFD1`. This seems a little fragile, but Ods, which I believe does not have the same row and column limits as Excel, certainly treats these values the same as Excel. This particular construction is formally unit-tested. Note, however, that although using these Defined Names as a formula on their own works just fine, a construction like `=SUM(A1:CellAbove)`, as suggested in the article, seems to put PhpSpreadsheet calculation engine in a loop. In the likely event that I can't solve that before I merge this change, I will open a new issue to that effect when I do merge it. Note that this can be handled without defined names as `=SUM(A$1:INDIRECT(ADDRESS(ROW()-1,COLUMN())))`. PhpSpreadsheet will handle this as a cell formula, but not yet as a Named Formula.

The tests show a breakdown evaluating `=ProductTotal` (product of 2 formulas using defined names with relative addresses) on the sheet on which it is defined, but it works from a different sheet. The usual debugging techniques show me why this is happening, but I can't see how to overcome it. As above, if I can't solve it before I merge, I will open a new issue.

For those situations where I intend to open a new issue, tests are added but are marked Incomplete. Because of those, I will leave this PR in draft status for 2 weeks before moving forward with it.
oleibman added a commit that referenced this issue Aug 30, 2023
* Correct Re-computation of Relative Addresses in Defined Names

Fix #3661. Insertion or deletion of rows or columns can cause changes to the ranges for Defined Names. In fact, only the absolute parts of such ranges should be adjusted, while the relative parts should be left alone. Otherwise, as the original issue documents, the adjustment to the relative portion winds up being double-counted when the Defined Name is referenced in a formula. The major part of this change is to ReferenceHelper and CellReferenceHelper to not adjust relative addresses for Defined Names. An additional small change is needed in the Calculation engine to `recursiveCalculationCell` when a Defined Formula is being calculated.

In a sense, this is a breaking change, but for an obscure use case which (a) was wrong, and (b) is unlikely to be of importance. Some of the tests in ReferenceHelperTest were wrong and are now corrected, with the results being cross-checked against Excel.

When a Defined Name using relative addressing is defined in Excel, the result is treated as relative to the active cell on the sheet in which the name is defined. PhpSpreadsheet treats it as relative to cell A1. I think that is a reasonable treatment, and will not change its behavior to match Excel's - that would definitely be a breaking change of some consequence.

An interesting use of relative address in a defined name is demonstrated at https://excelguru.ca/always-refer-to-the-cell-above/. Note that the steps there involve setting the selected cell to A2 before defining the name. When that spreadsheet is stored, the actual definition of the range is `A1048576`. Likewise, adding a defined name for the cell to the left would be stored as `XFD1`. This seems a little fragile, but Ods, which I believe does not have the same row and column limits as Excel, certainly treats these values the same as Excel. This particular construction is formally unit-tested. Note, however, that although using these Defined Names as a formula on their own works just fine, a construction like `=SUM(A1:CellAbove)`, as suggested in the article, seems to put PhpSpreadsheet calculation engine in a loop. In the likely event that I can't solve that before I merge this change, I will open a new issue to that effect when I do merge it. Note that this can be handled without defined names as `=SUM(A$1:INDIRECT(ADDRESS(ROW()-1,COLUMN())))`. PhpSpreadsheet will handle this as a cell formula, but not yet as a Named Formula.

The tests show a breakdown evaluating `=ProductTotal` (product of 2 formulas using defined names with relative addresses) on the sheet on which it is defined, but it works from a different sheet. The usual debugging techniques show me why this is happening, but I can't see how to overcome it. As above, if I can't solve it before I merge, I will open a new issue.

For those situations where I intend to open a new issue, tests are added but are marked Incomplete. Because of those, I will leave this PR in draft status for 2 weeks before moving forward with it.

* Fix productTotal Problem

Need to restore current cell after evaluating defined name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants