Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug #50224 where float without decimals were converted to integer when encode to JSON #642

Closed
wants to merge 14 commits into from

Conversation

jrbasso
Copy link
Contributor

@jrbasso jrbasso commented Apr 14, 2014

Refs #635

@bukka
Copy link
Member

bukka commented Apr 16, 2014

I think that this patch could introduce some regression as there is extra reallocation (spprintf is called twice). Maybe introducing another flag for spprintf would be a better solution. I think that this shouldn't be merged until you test encoding big arrays with many *.0 values. Maybe I'm wrong and the regression is minimal but it should be definitely tested IMHO...

@jrbasso
Copy link
Contributor Author

jrbasso commented Apr 17, 2014

@bukka Do you think replacing spprintf by the regular sprintf or creating a char with the extra 2 chars and setting them will solve your concerns?

@bukka
Copy link
Member

bukka commented Apr 19, 2014

I think that it would be great to do some perf testing to find out if there is any regression. The best way would be to create a big array (e.g. array_fill(0, 1000, 1.0)) and test encoding in the loop wrapped by microtime... Then you can compare both results (build without and with your patch) and you will see if there is a regression. If you do that, don't forget to compile with -O2 (without debug compile option) ;)

@jrbasso
Copy link
Contributor Author

jrbasso commented Apr 20, 2014

I did the performance test and it increased 23% the encoding time of floats with no decimal point. So, I did a refactor and this time decreased to less than 10%. The used memory increased 1%.

You can see the execution here: https://gist.github.com/jrbasso/11101696

PS: I rebased to the latest version of master to get the tests working on travis.

@@ -630,6 +638,14 @@ PHP_JSON_API void php_json_encode(smart_str *buf, zval *val, int options TSRMLS_

if (!zend_isinf(dbl) && !zend_isnan(dbl)) {
len = spprintf(&d, 0, "%.*k", (int) EG(precision), dbl);
if (strchr(d, '.') == NULL) {
char *nd = (char *)emalloc(len + 3);
strcpy(nd, d);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy(nd, d, len);

@bukka
Copy link
Member

bukka commented Apr 20, 2014

Think that memcpy should be a bit faster than strcpy and strcat but double check that if it's correct (I wrote it quickly without thinking... :) ). Just wondering if it helps or if it's optimized by compiler anyway...

@bukka
Copy link
Member

bukka commented Apr 20, 2014

Could you also try this variant?

len = spprintf(&d, 0, "%.*k", (int) EG(precision), dbl);
smart_str_appendl(buf, d, len);
if (strchr(d, '.') == NULL) {
    smart_str_appendl(buf, ".0", 2);
}
efree(d);

@jrbasso
Copy link
Contributor Author

jrbasso commented Apr 20, 2014

I tried using the memcpy option and it optimized a little bit more. Using smart_str_appendl got a little bit worse (but the code was cleaner). I also tried ecalloc and erealloc, but both performed badly, so I discarded.

Summary:

Tested version Time Memory (not real) Memory (real)
Without the patch 2.7151489257812 324912 524288
Using spprintf 3.3423249721527 328936 524288
Using strcpy 2.9965569972992 328840 524288
Using memcpy 2.9513249397278 328840 524288
Using smart_str_appendl 3.0528519153595 328832 524288

I also run a test with float value = 1.1 (only the extra if will be performed here) to compare the performance:

Tested version Time Memory (not real) Memory (real)
Without the patch 2.7151489257812 324912 524288
Using memcpy 2.7831361293793 328840 524288

@jrbasso
Copy link
Contributor Author

jrbasso commented Apr 20, 2014

PS: I tried to use memchr instead of strchr and the performance is about the same.

@jrbasso
Copy link
Contributor Author

jrbasso commented Apr 20, 2014

I was trying to optimize even more and I found a way to optimize replacing the original spprintf call to use php_gcvt. Here is the suggestion: https://gist.github.com/jrbasso/11123652

I didn't commit it because it change the original implementation and go beyond the bug fix. Also, I would like your approval before change it. All tests are passing.

In terms of performance, with this suggested change it results in 2.0594308376312, compared with the 2.7151489257812 (without the bug fix) and 2.9513249397278 (with the bug fix).

Thoughts?

@smalyshev
Copy link
Contributor

@jrbasso I'd add comment there stating where 2048 comes from. Otherwise, it looks ok to me.

The new function is faster and makes the decimal point easier to be added
@jrbasso
Copy link
Contributor Author

jrbasso commented Apr 21, 2014

I pushed the change. I made the 2048 a constant with a reference for the source.

@bukka
Copy link
Member

bukka commented Apr 21, 2014

Nice! Looks good to me too. ;)

Think that 2048 is a bit too much for double conversion. It's taken from apache impl when they chose that for all possible numeric conversion but it won't be never filled for this case IMHO...

I was actually thinking about similar implementation for jsond yesterday. I'm thinking to go a bit further and re-implement php_gcvt to prevent generating INF and NAN. Such cases would result in generating incorrect json. There also is space for a bit more optimization. I'll see. Maybe just checking if it's the result INF, -INF or NAN will be sufficient. But it's a bit off topic to this patch... Sorry :)

One last note. The patch can change the generated json string check sum so I'm not sure if it should go bellow 5.6. It's up to the RM to decide. I'm sure that Stas will decide wisely. ;)

@jrbasso
Copy link
Contributor Author

jrbasso commented Apr 21, 2014

I can reduce the variable size if you guys feel comfortable. I also think 2048 is too much, but I followed that number as it is used on the original implementation and there is a comment saying it can't be smaller.

@bukka About the invalid JSON, it is not fully true. Before the call to php_gcvt there is a if checking if the double is nan or inf. If it is true, the value 0 is added to the json string and it sets a JSON error. It means the JSON will be valid, but the value for that field will be "incorrect".
Have you considered to use some external library (such as jansson, rapidjson, etc) to handle the json encode/decode and just make json extension be a "translator" of zval to native variables? Maybe I am going off topic here as well.

@bukka
Copy link
Member

bukka commented Apr 21, 2014

@jrbasso Yeah you right. Missed that part. :) In that case using php_gcvt is probably the best solution. I have got already extension on PECL that has a new decoding part that's much faster than ext\json. Using external libs adds dependency and extra overhead which would be too expensive. I want to improve encoding as well but need to do a proper testing covering lots of json samples before that. I plan to start working on the new json generator. But that's definitely off topic... :)

Btw. the patch also fixes the annoying warning about k formatting flag. It's the only warning that I have in jsond so thanks for fixing that! ;)

@jrbasso
Copy link
Contributor Author

jrbasso commented Apr 21, 2014

You are welcome. :) It adds another warning of using char (*)[2048] when the method expects a char (*), but I think tolerable, otherwise you have to dynamically allocate the memory.

Nice to hear about the optimized version on PECL, maybe it can be a part of the core in the future. :)

@bukka
Copy link
Member

bukka commented Apr 21, 2014

That looks like a bug. You are passing pointer to array but it should be a pointer to char (pointer to first element in the array in this case - &num[0])

@bukka
Copy link
Member

bukka commented Apr 21, 2014

It's actually not a bug for gcc because the resulted address is the same but it suppresses the warning ;)

@jrbasso
Copy link
Contributor Author

jrbasso commented Apr 21, 2014

Good point. I updated the code to fix it. Thanks.

@bukka
Copy link
Member

bukka commented May 11, 2014

Hey, finally got time to merge it to jsond in bukka/php-jsond@118b0ab .

I changed it slightly and set different length for the buffer. The optimal size should be 3 + DBL_MANT_DIG - DBL_MIN_EXP (constants are from float.h). It's 1089 characters on my platform.

@jrbasso
Copy link
Contributor Author

jrbasso commented May 11, 2014

@bukka That's cool. Do you think I should update this PR too?

@bukka
Copy link
Member

bukka commented May 12, 2014

Even they are standard part of float.h, I'm not 100% sure that these constants are available on all supported platforms. Maybe just decreasing it to 1090 would be safer...

@datibbaw
Copy link
Contributor

@bukka True, I didn't take that into consideration ;-)

@smalyshev
Copy link
Contributor

Please confirm with 5.6 RM as 5.6 is in freeze now and pretty close to the release point.

@jrbasso
Copy link
Contributor Author

jrbasso commented Jul 22, 2014

@bukka and @smalyshev So what needs to be done with this PR? Change the code to use the DBL_* with ifndef? Should I rebase to latest master? Change the PR to another branch? Let me know what needs to be done and I can finish it quickly.

@bukka
Copy link
Member

bukka commented Jul 27, 2014

@jrbasso I just committed bukka/php-jsond@19e14ee to jsond and added such ifdefs. All main platforms support DBL_* macros so it's just in case... :) If you change this PR in the same way, I think that the patch is finished. For merging to 5.6 you need to confirm with 5.6 RM @Tyrael .

@bukka
Copy link
Member

bukka commented Jul 27, 2014

You might notice that the default value is 1080. Not sure where I got 1089 - I just tested the value and it's 1077 on my platform and max value that I googled was 1079 so it should never be higher than 1080. I'm almost sure that if it does, the constants will be defined float.h so it won't be a problem...

@jrbasso
Copy link
Contributor Author

jrbasso commented Jul 28, 2014

I added the change the code to use float.h. I am also getting 1077 on Mac OS and Ubuntu 12.04.4.

@Tyrael
Copy link
Contributor

Tyrael commented Jul 30, 2014

After giving this some though I think the best course of action would be introducing a new option constant for json_encode.
Even thought we have precedenc on doing that in a micro version (JSON_NUMERIC_CHECK was added in 5.3.3), I would still think that adding it in a minor version would be the best.
Having it in 5.6.0 is out of the question, as we are almost at the release.

@jrbasso
Copy link
Contributor Author

jrbasso commented Jul 30, 2014

@Tyrael Suggestions for the option name? I like the suggestion of JSON_PRESERVE_FRACTIONAL_PART from @plstand .

@Tyrael
Copy link
Contributor

Tyrael commented Jul 30, 2014

sounds fine.
2014.07.30. 19:15, "Juan Basso" [email protected] ezt írta:

@Tyrael https://github.com/Tyrael Suggestions for the option name? I
like the suggestion of JSON_PRESERVE_FRACTIONAL_PART from @plstand
https://github.com/plstand .


Reply to this email directly or view it on GitHub
#642 (comment).

@nikic
Copy link
Member

nikic commented Jul 30, 2014

@Tyrael I disagree with adding an option for this. Imho it's pretty clear that the old behavior is a bug and as such I don't see reason to preserve it. If such an option is added it should be the default (at which point the option won't even help with BC concerns for tests comparing json_encode output.)

jrbasso added 2 commits July 30, 2014 13:53
I put the original code back when the option is not being used.
@jrbasso
Copy link
Contributor Author

jrbasso commented Jul 30, 2014

I added the option on the code. Is up to you guys to decide if we keep it or not. Revert the commits is easy.

@Tyrael
Copy link
Contributor

Tyrael commented Jul 30, 2014

@nikic I don't think that we can really call it a bug. Javascript doesn't have a separate type for integers and floats, nor does the JSON spec, they only talk about numbers, which can have optional fraction parts.
So theoretically both implementations are correct, and even though that the current implementation seems to be the less common, it also has some advantages (truncating the trailing zeros can make the encoded string shorter) and this is what we do at the moment, so I think it would make sense to follow a more gradual approach, and first introduce an option to use it, and we can later make it the default, or even consider removing the old one in the future.

@bukka
Copy link
Member

bukka commented Jul 31, 2014

As I said I don't think that it's a bug exactly for the reason that Ferenc noted - JSON spec does not specify float type and as such the value is correctly converted to the number. However I think that it's useful (mainly for symetrical encryption/decryption). Also JS engines internally store numbers either as double or int so I understand why someone could consider it as a bug.

The additional constant seems reasonable due to the BC issue for minor version. However I think that it should be discussed on @Internals and if there are still objections from Nikita or others, then we should have RFC.

@jrbasso
Copy link
Contributor Author

jrbasso commented Aug 16, 2014

@bukka @Tyrael @nikic @smalyshev Do we have a consensus here or should I bring this discussion to internals' list?

@jrbasso
Copy link
Contributor Author

jrbasso commented Oct 26, 2014

@bukka @Tyrael @nikic @smalyshev Any news? Almost 3 months since the last comment. What is the directions to take from here?

@smalyshev
Copy link
Contributor

I think as an option we can have it in 5.6. Please drop a note on internals@, if there would be no objections then I'd merge it into 5.6.

@Tyrael
Copy link
Contributor

Tyrael commented Nov 3, 2014

For the record I'm fine with having it in 5.6.

@bukka
Copy link
Member

bukka commented Nov 3, 2014

I think that it would be a good idea to have it as default for PHP 7. If it becomes default, then this constant will be useful only for disabling it. However it means to do something like flags & ~JSON_PRESERVE_FRACTIONAL_PART which is not really user friendly. Or we could introduce another constant that will disable it. In that case this constant becomes useless and will be valid only >5.6.3 and <7.0 . Does it really make sense to introduce it if we are considering changing it as default for PHP 7...?

@bukka
Copy link
Member

bukka commented Nov 3, 2014

P.S. That doesn't mean that I'm for merging as default now. It's a BC break when it's default. I just think that there is not such a big need that we have to merge it now. If it could wait so long I think it can wait a bit longer (till PHP 7). Cheers.

@jrbasso
Copy link
Contributor Author

jrbasso commented Nov 4, 2014

Thread created on internals list. http://marc.info/?l=php-internals&m=141507087629656&w=2

bukka added a commit to bukka/php-jsond that referenced this pull request Dec 25, 2014
This is based on bug (feature request) PHP#50224 implemented
in php/php-src#642
#if defined(DBL_MANT_DIG) && defined(DBL_MIN_EXP)
#define NUM_BUF_SIZE (3 + DBL_MANT_DIG - DBL_MIN_EXP)
#else
#define NUM_BUF_SIZE 1080
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that not a tad, er, excessive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TazeTSchnitzel according with http://tigcc.ticalc.org/doc/float.html the value if the constants are defined is 3 + 16 - -999 = 1018.
There is research from @bukka and he explains on this comment:

You might notice that the default value is 1080. Not sure where I got 1089 - I just tested the value and it's 1077 on my platform and max value that I googled was 1079 so it should never be higher than 1080. I'm almost sure that if it does, the constants will be defined float.h so it won't be a problem...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually allocate less space on the stack than it was before this patch - The spprintf (xbuf_format_converter) allocates 2048 + additional space for other variables and function stack . See http://lxr.php.net/xref/PHP_TRUNK/main/spprintf.c#xbuf_format_converter for more details. As I said 1080 won't be probably used in any case as DBL_MANT_DIG, DBL_MIN_EXP are almost always defined so it will be mostly 1079 :). You might think that it's not necessary as the EG(precision) will be always smaller. Unfortunately we don't know its value at the C compile time (the dynamic allocation leads to the worse perf so we need to allocate on the stack). The only better way might be using alloca. I plan to experiment with that later to see if there no perf penalty but we need to have fallback anyway for non-alloca platforms. That requires the max space for double value otherwise there would be chance of the stack overflow...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was C99 we could do dynamic stack allocations. Alas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the constants are defined on windows, linux and mac. So just in rare cases it will fallback to the hardcoded value.

@smalyshev
Copy link
Contributor

merged

@smalyshev smalyshev closed this Jan 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants