Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BC break/RFC] Fix default cell type treatment as numeric for XLSX #1321

Closed
wants to merge 1 commit into from
Closed

[BC break/RFC] Fix default cell type treatment as numeric for XLSX #1321

wants to merge 1 commit into from

Conversation

rtek
Copy link
Contributor

@rtek rtek commented Jan 14, 2020

SpreadsheetML defines cell type attribute t to indicate the cell type.
The attribute is optional and defaults to "n" for numeric. The XLSX
Reader incorrectly defaults the cell type to "stringy" for both
undefined t attribute and t="n".

This is:

- [X] a bugfix
- [] a new feature

Checklist:

Why this change is needed?

SpreadsheetML defines the default cell type as numeric, but the XLSX reader treats a missing cell type or cell type = numeric as a "stringy" value. This means that the reader is discarding the numeric type information from the XML. In most cases this is masked by the DefaultBinder which explicitly preg's for numeric strings and correct the reader's behavior.

The bugs / inconsistencies are:

  1. Strings that pass the numeric preg are incorrectly casted to numeric
  2. TheIValueBinder is always presented with string, even when the XML defines the type as numeric
  3. A numeric cell that contains non-numeric information is not treated as an exception

This is probably a BC break as it seems like a fundamental issue with the Reader - I would expect there is lots of code relying on numeric strings being casted to a number despite the XML declaring them as strings.

SpreadsheetML defines cell type attribute t to indicate the cell type.
The attribute is optional and defaults to "n" for numeric. The XLSX
Reader incorrectly defaults the cell type to "stringy" for both
undefined t attribute and t="n".
@stale
Copy link

stale bot commented Mar 14, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If this is still an issue for you, please try to help by debugging it further and sharing your results.
Thank you for your contributions.

@stale stale bot added the stale label Mar 14, 2020
@stale stale bot closed this Mar 21, 2020
oleibman added a commit to oleibman/PhpSpreadsheet that referenced this pull request Aug 13, 2024
Suggested by PR PHPOffice#1321 from @rtek, which was marked stale in March 2020. If the xml for a cell does not specify a type, PhpSpreadsheet has been using string; however, Excel uses numeric. PhpSpreadsheet will be changed to do the same as Excel. In theory, this is a BC break; in practice, it is doubtful that anyone will be adversely affected. No changes in the existing unit tests were required. The new test from rtek does demonstrate the change in behavior; however, the new behavior is certainly what a user would expect to happen. Unlike the original PR, this one is restricted to Xlsx Reader; I do not believe that any logic change is needed for Writer. (N.B. - changing Writer to use class constants rather than literals is a good idea, but, with no other logic change in this version, it seems risky - I may do that as separate PR.)
@oleibman oleibman mentioned this pull request Aug 13, 2024
11 tasks
@oleibman
Copy link
Collaborator

Superseded by PR #4139.

@oleibman oleibman removed the stale label Aug 13, 2024
oleibman added a commit to oleibman/PhpSpreadsheet that referenced this pull request Aug 24, 2024
Replace PR PHPOffice#4139, which is in an awkward state due to a github problem.

Suggested by PR PHPOffice#1321 from @rtek, which was marked stale in March 2020. If the xml for a cell does not specify a type, PhpSpreadsheet has been using string; however, Excel uses numeric. PhpSpreadsheet will be changed to do the same as Excel. In theory, this is a BC break; in practice, it is doubtful that anyone will be adversely affected. No changes in the existing unit tests were required. The new test from rtek does demonstrate the change in behavior; however, the new behavior is certainly what a user would expect to happen. Unlike the original PR, this one is restricted to Xlsx Reader; I do not believe that any logic change is needed for Writer. (N.B. - changing Writer to use class constants rather than literals is a good idea, but, with no other logic change in this version, it seems risky - I may do that as separate PR.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants