Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Sort by column with NULL in next_page URL #216

Closed
earthboundkid opened this issue Apr 16, 2018 · 15 comments
Closed

Bug: Sort by column with NULL in next_page URL #216

earthboundkid opened this issue Apr 16, 2018 · 15 comments
Labels

Comments

@earthboundkid
Copy link

Copy-pasting from #189 (comment), since that issue is closed:

I think I found a bug. I tried to sort by middle initial in my salaries set, and many middle initials are null. The next_url gets set by Datasette to:

http://localhost:8001/salaries-d3a5631/2017+Maryland+state+salaries?_next=None%2C391&_sort=middle_initial

But then None is interpreted literally and it tries to find a name with the middle initial "None" and ends up skipping ahead to O on page 2.

@simonw
Copy link
Owner

simonw commented Apr 16, 2018

Yikes, definitely a bug.

@simonw simonw added the bug label Apr 16, 2018
@simonw
Copy link
Owner

simonw commented Apr 16, 2018

So there are two tricky problems to solve here:

  • I need a way of encoding null into that _next= that is unambiguous from the string None or null. This means introducing some kind of escaping mechanism in those strings. I already use URL encoding as part of the construction of those components here, maybe that can help here?
  • I need to figure out what the SQL should be for the "next" set of results if the previous value was null. Thankfully we use the primary key as a tie-breaker so this shouldn't be impossible.

@simonw
Copy link
Owner

simonw commented Apr 16, 2018

Relevant code:

datasette/datasette/app.py

Lines 828 to 832 in 904f1c7

if (sort or sort_desc) and not is_view:
prefix = str(rows[-2][sort or sort_desc])
next_value = '{},{}'.format(
urllib.parse.quote_plus(prefix), next_value
)

@simonw
Copy link
Owner

simonw commented Apr 16, 2018

I could use $null as a magic value that means None. Since I'm applying quote_plus() to actual values, any legit strings that look like this will be encoded as %24null:

>>> urllib.parse.quote_plus('$null')
'%24null'

@simonw
Copy link
Owner

simonw commented Apr 16, 2018

I think the correct SQL is this: https://datasette-issue-189-demo-3.now.sh/salaries-7859114-7859114?sql=select+rowid%2C+*+from+%5B2017+Maryland+state+salaries%5D%0D%0Awhere+%28middle_initial+is+not+null+or+%28middle_initial+is+null+and+rowid+%3E+%3Ap0%29%29%0D%0Aorder+by+middle_initial+limit+101&p0=391

select rowid, * from [2017 Maryland state salaries]
where (middle_initial is not null or (middle_initial is null and rowid > :p0))
order by middle_initial limit 101

Though this will also need to be taken into account for #198

@simonw
Copy link
Owner

simonw commented Apr 16, 2018

But what would that SQL look like for _sort_desc?

@simonw
Copy link
Owner

simonw commented Apr 16, 2018

Here's where that SQL gets constructed at the moment:

datasette/datasette/app.py

Lines 761 to 771 in 10a34f9

# Now add the sort SQL, which may incorporate next_by_pk_clauses
if sort or sort_desc:
where_clauses.append(
'({column} {op} :p{p} or ({column} = :p{p} and {next_clauses}))'.format(
column=escape_sqlite(sort or sort_desc),
op='>' if sort else '<',
p=len(params),
next_clauses=' and '.join(next_by_pk_clauses),
)
)
params['p{}'.format(len(params))] = sort_value

@simonw simonw closed this as completed in 2abe539 Apr 16, 2018
@simonw
Copy link
Owner

simonw commented Apr 16, 2018

Weird... tests are failing in Travis, despite passing on my local machine. https://travis-ci.org/simonw/datasette/builds/367423706

@simonw
Copy link
Owner

simonw commented Apr 17, 2018

Still failing. This is very odd.

@simonw
Copy link
Owner

simonw commented Apr 17, 2018

I'm reverting this out of master until I can figure out why the tests are failing.

@simonw simonw closed this as completed in 5364fa7 Apr 17, 2018
@simonw simonw reopened this Apr 17, 2018
@simonw
Copy link
Owner

simonw commented Apr 17, 2018

Here's the test that's failing:

datasette/tests/test_api.py

Lines 437 to 470 in 59a3aa8

(
'_sort_desc=sortable_with_nulls',
lambda row: (
1 if row['sortable_with_nulls'] is None else 0,
-row['sortable_with_nulls'] if row['sortable_with_nulls'] is not None else 0,
row['content']
),
'sorted by sortable_with_nulls descending'
),
# text column contains '$null' - ensure it doesn't confuse pagination:
('_sort=text', lambda row: row['text'], 'sorted by text'),
])
def test_sortable(app_client, query_string, sort_key, human_description_en):
path = '/test_tables/sortable.json?_shape=objects&{}'.format(query_string)
fetched = []
page = 0
while path:
page += 1
assert page < 100
response = app_client.get(path, gather_request=False)
assert human_description_en == response.json['human_description_en']
fetched.extend(response.json['rows'])
path = response.json['next_url']
assert 5 == page
expected = list(generate_sortable_rows(201))
expected.sort(key=sort_key)
import json
print('expected = {}'.format(json.dumps(expected)))
print('fetched = {}'.format(json.dumps(fetched)))
assert [
r['content'] for r in expected
] == [
r['content'] for r in fetched
]

I got Travis to spit out the fetched and expected variables.

expected has 201 items in it and is identical to what I get on my local laptop.

fetched has 250 items in it, so it's clearly different from my local environment.

I've managed to replicate the bug in production! I created a test database like this:

python tests/fixtures.py sortable.db

Then deployed that database like so:

datasette publish now sortable.db \
    --extra-options="--page_size=50" --branch=debug-travis-issue-216

And... if you click "next" on this page https://datasette-issue-216-pagination.now.sh/sortable-5679797/sortable?_sort_desc=sortable_with_nulls five times you get back 250 results, when you should only get back 201.

@simonw
Copy link
Owner

simonw commented Apr 17, 2018

The version that I deployed which exhibits the bug is running SQLite 3.8.7.1 - https://datasette-issue-216-pagination.now.sh/sortable-5679797?sql=select+sqlite_version%28%29

The version that I have running locally which does NOT exhibit the bug is running SQLite 3.23.0

@simonw
Copy link
Owner

simonw commented Apr 17, 2018

... which is VERY surprising, because 3.23.0 only came out on 2nd April this year: https://www.sqlite.org/changes.html - I have no idea how I came to be running that version on my laptop.

@simonw
Copy link
Owner

simonw commented Apr 17, 2018

This is the SQL that returns differing results in production and on my laptop: https://datasette-issue-216-pagination.now.sh/sortable-5679797?sql=select+%2A+from+sortable+where+%28sortable_with_nulls+is+null+and+%28%28pk1+%3E+%3Ap0%29%0A++or%0A%28pk1+%3D+%3Ap0+and+pk2+%3E+%3Ap1%29%29%29+order+by+sortable_with_nulls+desc+limit+51&p0=b&p1=t

select * from sortable where (sortable_with_nulls is null and ((pk1 > :p0)
  or
(pk1 = :p0 and pk2 > :p1))) order by sortable_with_nulls desc limit 51

I think that order by sortable_with_nulls desc bit is at fault - the primary keys should be included in that order by as well.

Sure enough, changing the query to this one returns the same results across both environments:

select * from sortable where (sortable_with_nulls is null and ((pk1 > :p0)
  or
(pk1 = :p0 and pk2 > :p1))) order by sortable_with_nulls desc, pk1, pk2 limit 51

simonw added a commit that referenced this issue Apr 17, 2018
Reverted commit 5364fa7 (where I removed the
code that didn't work).

Added primary keys to order-by clause for sorting to get tests to pass
@simonw
Copy link
Owner

simonw commented Apr 17, 2018

Fixed!

@simonw simonw closed this as completed Apr 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants