Fixed bug #50224 where float without decimals were converted to integer when encode to JSON #642

jrbasso · 2014-04-14T00:30:38Z

bukka · 2014-04-16T17:24:05Z

I think that this patch could introduce some regression as there is extra reallocation (spprintf is called twice). Maybe introducing another flag for spprintf would be a better solution. I think that this shouldn't be merged until you test encoding big arrays with many *.0 values. Maybe I'm wrong and the regression is minimal but it should be definitely tested IMHO...

jrbasso · 2014-04-17T04:19:56Z

@bukka Do you think replacing spprintf by the regular sprintf or creating a char with the extra 2 chars and setting them will solve your concerns?

bukka · 2014-04-19T19:13:50Z

I think that it would be great to do some perf testing to find out if there is any regression. The best way would be to create a big array (e.g. array_fill(0, 1000, 1.0)) and test encoding in the loop wrapped by microtime... Then you can compare both results (build without and with your patch) and you will see if there is a regression. If you do that, don't forget to compile with -O2 (without debug compile option) ;)

@smalyshev

Thanks to @smalyshev for the tip.

jrbasso · 2014-04-20T01:27:10Z

I did the performance test and it increased 23% the encoding time of floats with no decimal point. So, I did a refactor and this time decreased to less than 10%. The used memory increased 1%.

You can see the execution here: https://gist.github.com/jrbasso/11101696

PS: I rebased to the latest version of master to get the tests working on travis.

bukka · 2014-04-20T10:41:13Z

ext/json/json.c

@@ -630,6 +638,14 @@ PHP_JSON_API void php_json_encode(smart_str *buf, zval *val, int options TSRMLS_

 				if (!zend_isinf(dbl) && !zend_isnan(dbl)) {
 					len = spprintf(&d, 0, "%.*k", (int) EG(precision), dbl);
+					if (strchr(d, '.') == NULL) {
+						char *nd = (char *)emalloc(len + 3);
+						strcpy(nd, d);


memcpy(nd, d, len);

bukka · 2014-04-20T10:51:02Z

Think that memcpy should be a bit faster than strcpy and strcat but double check that if it's correct (I wrote it quickly without thinking... :) ). Just wondering if it helps or if it's optimized by compiler anyway...

bukka · 2014-04-20T10:56:32Z

Could you also try this variant?

len = spprintf(&d, 0, "%.*k", (int) EG(precision), dbl);
smart_str_appendl(buf, d, len);
if (strchr(d, '.') == NULL) {
    smart_str_appendl(buf, ".0", 2);
}
efree(d);

@bukka

Thanks @bukka

jrbasso · 2014-04-20T14:20:59Z

I tried using the memcpy option and it optimized a little bit more. Using smart_str_appendl got a little bit worse (but the code was cleaner). I also tried ecalloc and erealloc, but both performed badly, so I discarded.

Summary:

Tested version	Time	Memory (not real)	Memory (real)
Without the patch	2.7151489257812	324912	524288
Using `spprintf`	3.3423249721527	328936	524288
Using `strcpy`	2.9965569972992	328840	524288
Using `memcpy`	2.9513249397278	328840	524288
Using `smart_str_appendl`	3.0528519153595	328832	524288

I also run a test with float value = 1.1 (only the extra if will be performed here) to compare the performance:

Tested version	Time	Memory (not real)	Memory (real)
Without the patch	2.7151489257812	324912	524288
Using `memcpy`	2.7831361293793	328840	524288

jrbasso · 2014-04-20T14:29:01Z

PS: I tried to use memchr instead of strchr and the performance is about the same.

jrbasso · 2014-04-20T20:03:33Z

I was trying to optimize even more and I found a way to optimize replacing the original spprintf call to use php_gcvt. Here is the suggestion: https://gist.github.com/jrbasso/11123652

I didn't commit it because it change the original implementation and go beyond the bug fix. Also, I would like your approval before change it. All tests are passing.

In terms of performance, with this suggested change it results in 2.0594308376312, compared with the 2.7151489257812 (without the bug fix) and 2.9513249397278 (with the bug fix).

Thoughts?

smalyshev · 2014-04-20T22:42:03Z

@jrbasso I'd add comment there stating where 2048 comes from. Otherwise, it looks ok to me.

The new function is faster and makes the decimal point easier to be added

jrbasso · 2014-04-21T00:10:00Z

I pushed the change. I made the 2048 a constant with a reference for the source.

bukka · 2014-04-21T11:11:02Z

Nice! Looks good to me too. ;)

Think that 2048 is a bit too much for double conversion. It's taken from apache impl when they chose that for all possible numeric conversion but it won't be never filled for this case IMHO...

I was actually thinking about similar implementation for jsond yesterday. I'm thinking to go a bit further and re-implement php_gcvt to prevent generating INF and NAN. Such cases would result in generating incorrect json. There also is space for a bit more optimization. I'll see. Maybe just checking if it's the result INF, -INF or NAN will be sufficient. But it's a bit off topic to this patch... Sorry :)

One last note. The patch can change the generated json string check sum so I'm not sure if it should go bellow 5.6. It's up to the RM to decide. I'm sure that Stas will decide wisely. ;)

jrbasso · 2014-04-21T13:16:10Z

I can reduce the variable size if you guys feel comfortable. I also think 2048 is too much, but I followed that number as it is used on the original implementation and there is a comment saying it can't be smaller.

@bukka About the invalid JSON, it is not fully true. Before the call to php_gcvt there is a if checking if the double is nan or inf. If it is true, the value 0 is added to the json string and it sets a JSON error. It means the JSON will be valid, but the value for that field will be "incorrect".
Have you considered to use some external library (such as jansson, rapidjson, etc) to handle the json encode/decode and just make json extension be a "translator" of zval to native variables? Maybe I am going off topic here as well.

bukka · 2014-04-21T13:59:51Z

@jrbasso Yeah you right. Missed that part. :) In that case using php_gcvt is probably the best solution. I have got already extension on PECL that has a new decoding part that's much faster than ext\json. Using external libs adds dependency and extra overhead which would be too expensive. I want to improve encoding as well but need to do a proper testing covering lots of json samples before that. I plan to start working on the new json generator. But that's definitely off topic... :)

Btw. the patch also fixes the annoying warning about k formatting flag. It's the only warning that I have in jsond so thanks for fixing that! ;)

jrbasso · 2014-04-21T16:02:51Z

You are welcome. :) It adds another warning of using char (*)[2048] when the method expects a char (*), but I think tolerable, otherwise you have to dynamically allocate the memory.

Nice to hear about the optimized version on PECL, maybe it can be a part of the core in the future. :)

bukka · 2014-04-21T17:03:29Z

That looks like a bug. You are passing pointer to array but it should be a pointer to char (pointer to first element in the array in this case - &num[0])

bukka · 2014-04-21T17:10:26Z

It's actually not a bug for gcc because the resulted address is the same but it suppresses the warning ;)

jrbasso · 2014-04-21T19:29:33Z

Good point. I updated the code to fix it. Thanks.

bukka · 2014-05-11T16:43:45Z

Hey, finally got time to merge it to jsond in bukka/php-jsond@118b0ab .

I changed it slightly and set different length for the buffer. The optimal size should be 3 + DBL_MANT_DIG - DBL_MIN_EXP (constants are from float.h). It's 1089 characters on my platform.

jrbasso · 2014-05-11T18:38:07Z

@bukka That's cool. Do you think I should update this PR too?

bukka · 2014-05-12T18:54:36Z

Even they are standard part of float.h, I'm not 100% sure that these constants are available on all supported platforms. Maybe just decreasing it to 1090 would be safer...

datibbaw · 2014-07-21T16:14:59Z

@bukka True, I didn't take that into consideration ;-)

smalyshev · 2014-07-22T00:46:11Z

Please confirm with 5.6 RM as 5.6 is in freeze now and pretty close to the release point.

jrbasso · 2014-07-22T02:17:32Z

@bukka and @smalyshev So what needs to be done with this PR? Change the code to use the DBL_* with ifndef? Should I rebase to latest master? Change the PR to another branch? Let me know what needs to be done and I can finish it quickly.

bukka · 2014-07-27T19:23:33Z

@jrbasso I just committed bukka/php-jsond@19e14ee to jsond and added such ifdefs. All main platforms support DBL_* macros so it's just in case... :) If you change this PR in the same way, I think that the patch is finished. For merging to 5.6 you need to confirm with 5.6 RM @Tyrael .

bukka · 2014-07-27T19:29:24Z

You might notice that the default value is 1080. Not sure where I got 1089 - I just tested the value and it's 1077 on my platform and max value that I googled was 1079 so it should never be higher than 1080. I'm almost sure that if it does, the constants will be defined float.h so it won't be a problem...

jrbasso · 2014-07-28T02:49:45Z

I added the change the code to use float.h. I am also getting 1077 on Mac OS and Ubuntu 12.04.4.

Tyrael · 2014-07-30T16:02:05Z

After giving this some though I think the best course of action would be introducing a new option constant for json_encode.
Even thought we have precedenc on doing that in a micro version (JSON_NUMERIC_CHECK was added in 5.3.3), I would still think that adding it in a minor version would be the best.
Having it in 5.6.0 is out of the question, as we are almost at the release.

jrbasso · 2014-07-30T17:13:07Z

@Tyrael Suggestions for the option name? I like the suggestion of JSON_PRESERVE_FRACTIONAL_PART from @plstand .

Tyrael · 2014-07-30T17:32:55Z

sounds fine.
2014.07.30. 19:15, "Juan Basso" [email protected] ezt írta:

@Tyrael https://github.com/Tyrael Suggestions for the option name? I
like the suggestion of JSON_PRESERVE_FRACTIONAL_PART from @plstand
https://github.com/plstand .

—
Reply to this email directly or view it on GitHub
#642 (comment).

nikic · 2014-07-30T17:39:34Z

@Tyrael I disagree with adding an option for this. Imho it's pretty clear that the old behavior is a bug and as such I don't see reason to preserve it. If such an option is added it should be the default (at which point the option won't even help with BC concerns for tests comparing json_encode output.)

I put the original code back when the option is not being used.

…PART is disabled.

jrbasso · 2014-07-30T18:02:42Z

I added the option on the code. Is up to you guys to decide if we keep it or not. Revert the commits is easy.

Tyrael · 2014-07-30T18:38:05Z

@nikic I don't think that we can really call it a bug. Javascript doesn't have a separate type for integers and floats, nor does the JSON spec, they only talk about numbers, which can have optional fraction parts.
So theoretically both implementations are correct, and even though that the current implementation seems to be the less common, it also has some advantages (truncating the trailing zeros can make the encoded string shorter) and this is what we do at the moment, so I think it would make sense to follow a more gradual approach, and first introduce an option to use it, and we can later make it the default, or even consider removing the old one in the future.

bukka · 2014-07-31T13:07:48Z

As I said I don't think that it's a bug exactly for the reason that Ferenc noted - JSON spec does not specify float type and as such the value is correctly converted to the number. However I think that it's useful (mainly for symetrical encryption/decryption). Also JS engines internally store numbers either as double or int so I understand why someone could consider it as a bug.

The additional constant seems reasonable due to the BC issue for minor version. However I think that it should be discussed on @Internals and if there are still objections from Nikita or others, then we should have RFC.

jrbasso · 2014-08-16T18:48:48Z

@bukka @Tyrael @nikic @smalyshev Do we have a consensus here or should I bring this discussion to internals' list?

jrbasso · 2014-10-26T00:54:19Z

@bukka @Tyrael @nikic @smalyshev Any news? Almost 3 months since the last comment. What is the directions to take from here?

smalyshev · 2014-11-02T21:43:40Z

I think as an option we can have it in 5.6. Please drop a note on internals@, if there would be no objections then I'd merge it into 5.6.

Tyrael · 2014-11-03T09:42:39Z

For the record I'm fine with having it in 5.6.

bukka · 2014-11-03T10:24:11Z

I think that it would be a good idea to have it as default for PHP 7. If it becomes default, then this constant will be useful only for disabling it. However it means to do something like flags & ~JSON_PRESERVE_FRACTIONAL_PART which is not really user friendly. Or we could introduce another constant that will disable it. In that case this constant becomes useless and will be valid only >5.6.3 and <7.0 . Does it really make sense to introduce it if we are considering changing it as default for PHP 7...?

bukka · 2014-11-03T10:28:33Z

P.S. That doesn't mean that I'm for merging as default now. It's a BC break when it's default. I just think that there is not such a big need that we have to merge it now. If it could wait so long I think it can wait a bit longer (till PHP 7). Cheers.

jrbasso · 2014-11-04T14:18:19Z

Thread created on internals list. http://marc.info/?l=php-internals&m=141507087629656&w=2

This is based on bug (feature request) PHP#50224 implemented in php/php-src#642

hikari-no-yume · 2015-01-11T02:33:13Z

ext/json/json.c

+#if defined(DBL_MANT_DIG) && defined(DBL_MIN_EXP)
+#define NUM_BUF_SIZE (3 + DBL_MANT_DIG - DBL_MIN_EXP)
+#else
+#define NUM_BUF_SIZE 1080


Is that not a tad, er, excessive?

@TazeTSchnitzel according with http://tigcc.ticalc.org/doc/float.html the value if the constants are defined is 3 + 16 - -999 = 1018.
There is research from @bukka and he explains on this comment:

You might notice that the default value is 1080. Not sure where I got 1089 - I just tested the value and it's 1077 on my platform and max value that I googled was 1079 so it should never be higher than 1080. I'm almost sure that if it does, the constants will be defined float.h so it won't be a problem...

We actually allocate less space on the stack than it was before this patch - The spprintf (xbuf_format_converter) allocates 2048 + additional space for other variables and function stack . See http://lxr.php.net/xref/PHP_TRUNK/main/spprintf.c#xbuf_format_converter for more details. As I said 1080 won't be probably used in any case as DBL_MANT_DIG, DBL_MIN_EXP are almost always defined so it will be mostly 1079 :). You might think that it's not necessary as the EG(precision) will be always smaller. Unfortunately we don't know its value at the C compile time (the dynamic allocation leads to the worse perf so we need to allocate on the stack). The only better way might be using alloca. I plan to experiment with that later to see if there no perf penalty but we need to have fallback anyway for non-alloca platforms. That requires the max space for double value otherwise there would be chance of the stack overflow...

If this was C99 we could do dynamic stack allocations. Alas.

Btw, the constants are defined on windows, linux and mac. So just in rare cases it will fallback to the hardcoded value.

smalyshev · 2015-01-19T18:11:13Z

merged

jrbasso added 3 commits April 20, 2014 00:14

Fixed bug #50224 where float without decimals were converted to integer

4c263b6

Replacing use of strstr to strchr for performance reasons

56b4ba5

Thanks to @smalyshev for the tip.

Performance updates

9bb2248

bukka reviewed Apr 20, 2014
View reviewed changes

Performance optimizations

cbe1360

Thanks @bukka

Changed how float numbers are converted to JSON strings

52a8fb0

The new function is faster and makes the decimal point easier to be added

Fixing pointer address to avoid compiler warnings

bb6697b

Reducing the size of string for float to string conversion

408b7f7

Using values from float.h and fallbacking to hardcoded value

dd52b7b

jrbasso added 2 commits July 30, 2014 13:53

Added JSON_PRESERVE_FRACTIONAL_PART option

8c41b03

I put the original code back when the option is not being used.

Optimized the original code when the option JSON_PRESERVE_FRACTIONAL_…

4a53df1

…PART is disabled.

Added a condition to avoid overflow

8ff879d

bukka added a commit to bukka/php-jsond that referenced this pull request Dec 25, 2014

Add JSON_PRESERVE_FRACTIONAL_PART to optionally keep float type

be35307

This is based on bug (feature request) PHP#50224 implemented in php/php-src#642

hikari-no-yume reviewed Jan 11, 2015
View reviewed changes

Renamed constant name after discussion on internals list

6c8d3dd

jrbasso mentioned this pull request Jan 13, 2015

Porting implementation of RFC json_preserve_fractional_part bukka/php-src#1

Merged

smalyshev closed this Jan 19, 2015

Fixed bug #50224 where float without decimals were converted to integer when encode to JSON #642

Fixed bug #50224 where float without decimals were converted to integer when encode to JSON #642

Conversation

jrbasso commented Apr 14, 2014

bukka commented Apr 16, 2014

jrbasso commented Apr 17, 2014

bukka commented Apr 19, 2014

jrbasso commented Apr 20, 2014

bukka Apr 20, 2014

Choose a reason for hiding this comment

bukka commented Apr 20, 2014

bukka commented Apr 20, 2014

jrbasso commented Apr 20, 2014

jrbasso commented Apr 20, 2014

jrbasso commented Apr 20, 2014

smalyshev commented Apr 20, 2014

jrbasso commented Apr 21, 2014

bukka commented Apr 21, 2014

jrbasso commented Apr 21, 2014

bukka commented Apr 21, 2014

jrbasso commented Apr 21, 2014

bukka commented Apr 21, 2014

bukka commented Apr 21, 2014

jrbasso commented Apr 21, 2014

bukka commented May 11, 2014

jrbasso commented May 11, 2014

bukka commented May 12, 2014

datibbaw commented Jul 21, 2014

smalyshev commented Jul 22, 2014

jrbasso commented Jul 22, 2014

bukka commented Jul 27, 2014

bukka commented Jul 27, 2014

jrbasso commented Jul 28, 2014

Tyrael commented Jul 30, 2014

jrbasso commented Jul 30, 2014

Tyrael commented Jul 30, 2014

nikic commented Jul 30, 2014

jrbasso commented Jul 30, 2014

Tyrael commented Jul 30, 2014

bukka commented Jul 31, 2014

jrbasso commented Aug 16, 2014

jrbasso commented Oct 26, 2014

smalyshev commented Nov 2, 2014

Tyrael commented Nov 3, 2014

bukka commented Nov 3, 2014

bukka commented Nov 3, 2014

jrbasso commented Nov 4, 2014

hikari-no-yume Jan 11, 2015

Choose a reason for hiding this comment

jrbasso Jan 11, 2015

Choose a reason for hiding this comment

bukka Jan 11, 2015

Choose a reason for hiding this comment

hikari-no-yume Jan 11, 2015

Choose a reason for hiding this comment

jrbasso Jan 11, 2015

Choose a reason for hiding this comment

smalyshev commented Jan 19, 2015