Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: tpch query 6 mismatched results #38320

Closed
asubiotto opened this issue Jun 20, 2019 · 2 comments
Closed

exec: tpch query 6 mismatched results #38320

asubiotto opened this issue Jun 20, 2019 · 2 comments
Labels
A-sql-vec SQL vectorized engine C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@asubiotto
Copy link
Contributor

Running sf10 on a 6 node cluster.

root@localhost:26257/tpch> SELECT sum(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= DATE '1994-01-01' AND l_shipdate < DATE '1994-01-01' + INTERVAL '1' YEAR AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01 AND l_quantity < 24;
         revenue
+------------------------+
  1.2301136360101066e+09
(1 row)

Time: 32.490876986s

root@localhost:26257/tpch> SET experimental_vectorize=always;
SET

Time: 496.18µs

warning: error retrieving the database name: pq: unsupported processor core localPlanNode:<RowSourceIdx:0 NumInputs:0 Name:"virtual table" >
root@localhost:26257/?> SELECT sum(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= DATE '1994-01-01' AND l_shipdate < DATE '1994-01-01' + INTERVAL '1' YEAR AND l_discount BETWEEN 0.06 - 0.01 AND 0.06 + 0.01 AND l_quantity < 24;
         revenue
+------------------------+
  1.2301136360101078e+09
(1 row)

Time: 13.140193121s
@asubiotto asubiotto added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-vec SQL vectorized engine labels Jun 20, 2019
@yuzefovich
Copy link
Member

I've encountered somewhat similar issue when working on #34891. On TPCH query 7 back in February we were returning

        {`FRANCE`, `GERMANY`, `1995`, `5.463973273360001e+07`},
        {`FRANCE`, `GERMANY`, `1996`, `5.463308330760002e+07`},
        {`GERMANY`, `FRANCE`, `1995`, `5.253174666969999e+07`},
        {`GERMANY`, `FRANCE`, `1996`, `5.252054902239986e+07`},

whereas right now on master we're returning

  FRANCE      | GERMANY     |   1995 | 5.463973273359995e+07  
  FRANCE      | GERMANY     |   1996 | 5.463308330759997e+07  
  GERMANY     | FRANCE      |   1995 | 5.253174666969997e+07  
  GERMANY     | FRANCE      |   1996 | 5.252054902239985e+07

These both results were obtained with the default cluster settings, so it's not (or it shouldnot be) related to vectorized execution.

Postgres returns

 FRANCE                    | GERMANY                   |   1995 | 54639732.7336
 FRANCE                    | GERMANY                   |   1996 | 54633083.3076
 GERMANY                   | FRANCE                    |   1995 | 52531746.6697
 GERMANY                   | FRANCE                    |   1996 | 52520549.0224

which confirms that both of our answers are, in fact, correct (according to TPCH spec) since the spec requires the precision of 0.01.

A similar issue was with query 8 as well:
CRDB in February:

{`1995`, `0.034435890406654825`},
`1996`, `0.04148552129353028`},

CRDB now:

{`1995`, `0.03443589040665483`},
{`1996`, `0.04148552129353034`},

Postgres:

    1995 | 0.03443589040665483  
    1996 | 0.04148552129353034  

And on query 22:
used to be:

{`13`, `888`, `6.737713989999999e+06`},
{`17`, `861`, `6.46057372e+06`},
{`18`, `964`, `7.236687400000014e+06`},
{`23`, `892`, `6.701457950000009e+06`},
{`29`, `948`, `7.158866629999997e+06`},
{`30`, `909`, `6.808436130000002e+06`},
{`31`, `922`, `6.806670179999986e+06`},

now:

  13        |     888 | 6.737713990000005e+06  
  17        |     861 | 6.460573720000007e+06  
  18        |     964 | 7.236687400000006e+06  
  23        |     892 | 6.701457950000001e+06  
  29        |     948 | 7.158866629999988e+06  
  30        |     909 | 6.80843613e+06         
  31        |     922 | 6.80667018e+06  

@asubiotto
Copy link
Contributor Author

Thanks for looking into this @yuzefovich, closing because it seems like these are just precision differences allowed by the TPCH spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-vec SQL vectorized engine C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

2 participants