-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% Coverage for Calculation/DateTime #1870
Conversation
The code in DateTime is now completely covered. Along the way, some errors were discovered and corrected. - The tests which have had to be changed at the start of every year are replaced by more robust equivalents which do not require annual changes. - Several places in the code where Gnumeric and OpenOffice were thought to differ from Excel do not appear to have had any justification. I have left a comment where such code has been removed. - Use DateTime when possible rather than date, time, or strftime functions to avoid potential Y2038 problems. - Some impossible code has been removed, replaced by an explanatory comment. - NETWORKDAYS had a bug when the start date was Sunday. There had been no tests of this condition. - Some functions allow boolean and null arguments where a number is expected. This is more complicated than the equivalent situations in MathTrig because the initial date for these calculations can be Day 1 rather than Day 0. - More testing for dates from 1900-01-01 through the fictitious everywhere-but-Excel 1900-01-29. - This showed that there is an additional Excel bug - Excel evaluates WEEKNUM(emptycell) as 0, which is not a valid result for WEEKNUM without a second argument. PhpSpreadsheet now duplicates this bug. - There is a similar and even worse bug for 1904-01-01 in 1904 calculations. Weeknum returns 0 for this, but returns the correct value for arguments of 0 or null. - DATEVALUE should accept 1900-02-29 (sigh) and relatives. PhpSpreadsheet now duplicates this bug. - Testing bootstrap sets default timezone. This appears to be a relic from the releases of PHP where the unwise decision, subsequenly reversed, was made to issue messages for "no default timezone is set" rather than just use a sensible default. This was a disruptive setting for some of the tests I added. There is only one test in the entire suite which is default-timezone-dependent. Setting and resetting of default timezone is moved to that test (Reader/ODS/ODSTest), and out of bootstrap. - There had been no testing of NOW() function. - DATEVALUE test had no tests for 1904 calendar and needs some. - DATE test changed 1900/1904 calendar in use without restoring it. - WEEKDAY test had no tests for 1904 calendar and needs some. - Which revealed a bug in Shared/Date (excelToDateTimeObject was not recognizing 1904-01-01 as valid when 1904 calendar is in use). - And an additional bug in that legal 1904-calendar values in the 0.0-1.0 range yielded the same "wrong" answers as 1900-calendar (see "One note" below). Also the comment for one of the calendar-1904 tests was wrong in attempting to identify what time of day the fraction represented. I had wanted to break this up into a set of smaller modules, a process already started for Engineering and MathTrig. However the number of source code changes was sufficient that I wanted a clean delta for this request. If it is merged, I will work on breaking it up afterwards. One note - Shared/Date/excelToDateTimeObject, when calendar-1900 is in use, returns an unexpected result if its argument is between 0 and 1, which is nominally invalid for that calendar. It uses a base-1970 calendar in that instance. That check is not justifiable for calendar-1904, where values in that range are legal, so I made the check specific to calendar-1900, and adjusted 3 1904 unit test results accordingly. However, I have to admit that I don't understand why that check should be made even for calendar-1900. It certainly doesn't match anything that Excel does. I would recommend scrapping that code altogether. If agreed, I would do this as part of the break-up into smaller modules. Another note - more controversially, it is clear that PhpSpreadsheet needs to support the Excel and PHP date formats. Although it requires further study, I am not convinced that it needs to support Unix timestamp format. Since that is a potential source of Y2038 problems on 32-bit systems, I would like to open a PR to deprecate the use of that format. Please let me know if you are aware of a valid reason to continue to support it.
The usual set of stuff.
If at first you don't succeed ...
I'll take a more detailed look later. Just as a piece of MS Excel trivia, the fictitious 1900 leap day was in Lotus 1-2-3; and back in 19 MS were trying to steal the 1-2-3 customer base for their brand new Excel version, so they implemented all the functions that existed in 1-2-3 so that users could run their existing 1-2-3 spreadsheets and they would return identical results... including retaining the erroneous 1900 leap day |
Thanks - it is always helpful to know the history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note - Shared/Date/excelToDateTimeObject, when calendar-1900 is in use, returns an unexpected result if its argument is between 0 and 1, which is nominally invalid for that calendar.
It uses a base-1970 calendar in that instance. That check is not justifiable for calendar-1904, where values in that range are legal, so I made the check specific to calendar-1900, and adjusted 3 1904 unit test results accordingly. However, I have to admit that
I don't understand why that check should be made even for calendar-1900.
It certainly doesn't match anything that Excel does.
I would recommend scrapping that code altogether.
If agreed, I would do this as part of the break-up into smaller modules.
Dredging from memory, I believe that this is how PHSpreadsheet recognises time-only values internally, where MS Excel uses a value in the range 0-1 for handling any calculations (e.g. durations) rather than actual dates/times... e.g. adding a 4:30 hour duration a task that's started at 22:30 on 31st December 2021.
Something (using the advanced value binder to convert those string values to dates/times) like:
$worksheet->
$worksheet->setCellValue('A1', 'Start Task');
$worksheet->setCellValue('B1', 'Duration');
$worksheet->setCellValue('A1', 'End Task');
$worksheet->getCell('A2')->setValue('31-Dec-2021 22:30')
->getStyle->getNumberFormat()->setFormatCode('dd-mmm-yyyy hh:mm:ss');
$worksheet->getCell('B2')->setValue('4:30')
->getStyle->getNumberFormat()->setFormatCode('[h]:mm');
$worksheet->getCell('C2')->setValue('=A2+B2')
->getStyle->getNumberFormat()->setFormatCode('dd-mmm-yyyy hh:mm:ss');
Another note -
more controversially, it is clear that PhpSpreadsheet needs to support the Excel and PHP date formats. Although it requires further study,
I am not convinced that it needs to support Unix timestamp format.
Since that is a potential source of Y2038 problems on 32-bit systems,
I would like to open a PR to deprecate the use of that format.
Please let me know if you are aware of a valid reason to continue to support it.
This largely goes back to legacy PHPExcel. PHPExcel was first released a month or two after PHP 5.2.0, which was when DateTime objects were introduced for the very first time, so almost all the userbase was still using unix timestamps for anything date-related. So unix timestamps were still "first class citizens" while DateTime objects took a while to establish themselves.
I believe that DateTime objects should be our main consideration, and used consistently internally, but that we still need to allow users to enter/retrieve unix timestamps for legacy (especially if they're on 64-bit systems where Y2038 isn't a major problem).
Just as a reference, I started a new job 2-weeks ago where the current codebase is PHP 5.5, and still uses unix timestamps internally.... and the business is more insistent that I provide support for the Arabic language before the end of March than it is for allowing a language upgrade to PHP 7.... proof that unix timestamps aren't yet dead. :-(
Several places in the code where Gnumeric and OpenOffice were thought to differ
from Excel do not appear to have had any justification.
I have left a comment where such code has been removed.
Again, I believe that this is historic, and these spreadsheet programs have adjusted their codebase over time to bring it more compatible with MS Excel.
…1891) I ran the test suite using 32-bit PHP. There were 2 places where changes were needed due to 32-bit timestamps. Reader\\Xml.php was using strtotime as an intermediate step in converting a string timestamp to an Excel timestamp. The XML file type stores pure timestamps (i.e. no date portion) as, e.g., 1899-12-31T02:30:00.000, and that value causes an error using strtotime on a 32-bit system. However, it is sufficient to use that value in a DateTime constructor, and that will work for 32- and 64-bit. There was no test for that particular cell, so I added one to the XML read test. And that's when I discovered the getFormattedValue bug. The cell's format is `hh":"mm":"ss`. The quotes around the colons are disrupting the formatting. PhpSpreadsheet formats the cell by converting the Excel format to a Php Date format, in this case `H\:m\:s`. That's a problem, since Excel thinks 'm' means *minutes*, but PHP thinks it means *months*. This is not a problem when the colon is not quoted; there are ample tests for that. I added my best guess as to how to recognize this situation, changing `\:m` to `:i`. The XML read test now succeeds, and no other tests were broken by this change. Test Shared\\DateTest had one test where the expected result of converting to a Unix timestamp exceeds 2**32. Since a Unix timestamp is strictly an int, that test fails on a 32-bit system. In the discussion regarding recently merged PR #1870, it was felt that the user base might still be using the functions that convert to and from a timestamp. So, we should not drop this test, but, since it cannot succeed on a 32-bit system, I changed it to be skipped whenever the expected result exceeded PHP_INT_MAX. There are 3 "toTimestamp" functions within that test. Only one of these had been affected, but I thought it was a good idea to add additional tests to the others to demonstrate this condition. In the course of testing, I also discovered some 32-bit problems with bitwise and base-conversion functions. I am preparing separate PRs to deal with those.
The code in DateTime is now completely covered.
Along the way, some errors were discovered and corrected.
replaced by more robust equivalents which do not require annual changes.
from Excel do not appear to have had any justification.
I have left a comment where such code has been removed.
potential Y2038 problems.
of this condition.
This is more complicated than the equivalent situations in MathTrig because
the initial date for these calculations can be Day 1 rather than Day 0.
everywhere-but-Excel 1900-01-29.
WEEKNUM(emptycell) as 0, which is not a valid result for
WEEKNUM without a second argument.
PhpSpreadsheet now duplicates this bug.
Weeknum returns 0 for this,
but returns the correct value for arguments of 0 or null.
PhpSpreadsheet now duplicates this bug.
the releases of PHP where the unwise decision, subsequenly reversed,
was made to issue messages for
"no default timezone is set" rather than just use a sensible default.
This was a disruptive setting for some of the tests I added.
There is only one test in the entire suite which is default-timezone-dependent.
Setting and resetting of default timezone is moved to that test
(Reader/ODS/ODSTest), and out of bootstrap.
recognizing 1904-01-01 as valid when 1904 calendar is in use).
range yielded the same "wrong" answers as 1900-calendar (see "One note" below).
Also the comment for one of the calendar-1904 tests was wrong in attempting
to identify what time of day the fraction represented.
I had wanted to break this up into a set of smaller modules, a process already
started for Engineering and MathTrig.
However the number of source code changes was sufficient that I wanted
a clean delta for this request.
If it is merged, I will work on breaking it up afterwards.
One note - Shared/Date/excelToDateTimeObject, when calendar-1900 is in use,
returns an unexpected result if its argument is between 0 and 1,
which is nominally invalid for that calendar.
It uses a base-1970 calendar in that instance. That check is not justifiable
for calendar-1904, where values in that range are legal,
so I made the check specific to calendar-1900,
and adjusted 3 1904 unit test results accordingly. However, I have to admit that
I don't understand why that check should be made even for calendar-1900.
It certainly doesn't match anything that Excel does.
I would recommend scrapping that code altogether.
If agreed, I would do this as part of the break-up into smaller modules.
Another note -
more controversially, it is clear that PhpSpreadsheet needs to support
the Excel and PHP date formats. Although it requires further study,
I am not convinced that it needs to support Unix timestamp format.
Since that is a potential source of Y2038 problems on 32-bit systems,
I would like to open a PR to deprecate the use of that format.
Please let me know if you are aware of a valid reason to continue to support it.
This is:
Checklist:
Why this change is needed?
Test coverage, minor bug fixes.