Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merger should only apply limits under certain conditions #354

Closed
julgus opened this issue Jul 6, 2023 · 2 comments
Closed

Merger should only apply limits under certain conditions #354

julgus opened this issue Jul 6, 2023 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@julgus
Copy link
Member

julgus commented Jul 6, 2023

Describe the bug
JPAStreamer cannot optimize anonymous lambdas, i.e. f -> f.getLength() > 120. Hence the need for a JPAStreamer metamodel. This means if such an operation occurs early in the pipeline, the rest of the operations, regardless of whether they use the JPAStreamer metamodel or not, may also be executed in the JVM instead of the database.

Below is an example of a query that cannot be optimized at all since the length filter must be applied before the limit, otherwise the results will be different. Thus all JPAStreamer can do is issue a query that returns all Films in the DB and execute the anonymous lambda in the JVM. If the limit() did not exist, JPAStreamer could in fact reorder the operations:

            final List<String> actual = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .limit(10)
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .map(Film$.title)
                    .collect(Collectors.toList());

As of now, JPAStreamer seems to overlook the lambda and reorders the operations in a way that distorts the results.

Expected behavior
This should generate the following query as the lambda cannot be moved according to reordering rules:

Hibernate: 
    /* <criteria> */ select
        f1_0.film_id,
        f1_0.description,
        f1_0.language_id,
        f1_0.last_update,
        f1_0.length,
        f1_0.rating,
        f1_0.rental_duration,
        f1_0.rental_rate,
        f1_0.replacement_cost,
        f1_0.special_features,
        f1_0.title 
    from
        film f1_0 limit ?

Actual behavior

The generated query includes the second filter and the sorting operation. This means the first filter is applied in the JVM after the limit has been applied. Such reordering changes the result set.

Hibernate: 
    /* <criteria> */ select
        f1_0.film_id,
        f1_0.description,
        f1_0.language_id,
        f1_0.last_update,
        f1_0.length,
        f1_0.rating,
        f1_0.rental_duration,
        f1_0.rental_rate,
        f1_0.replacement_cost,
        f1_0.special_features,
        f1_0.title 
    from
        film f1_0 
    where
        f1_0.title like ? 
    order by
        f1_0.length asc limit ?

How To Reproduce
The following integration test fails:

try (StreamSupplier<Film> supplier = jpaStreamer.createStreamSupplier(Film.class)) {

            final List<String> actual = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .limit(10)
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .map(Film$.title)
                    .collect(Collectors.toList());

            final List<String> expected = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .limit(10)
                    .sorted(Comparator.comparing(Film::getLength))
                    .filter(f -> f.getTitle().startsWith("A"))
                    .map(Film::getTitle)
                    .collect(Collectors.toList());

            assertEquals(expected, actual);
}

Build tool
e.g. Maven 3.9.0

JPAStreamer version
At least JPAStreamer 3.0.0 and later

JPA Provider
e.g. Hibernate 6.0.2.Final

Java Version
e.g. Java 11.0.19

** Context around operation ordering **

            final List<String> actual = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .map(Film$.title)
                    .collect(Collectors.toList());

Can safely be reordered as the filters are commutative and it doesn't matter if the sorting is done prior to filtering:

            final List<String> actual = supplier.stream()
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .filter(f -> f.getLength() > 120)
                    .map(Film$.title)
                    .collect(Collectors.toList());

In this case, the first sort and title predicate can be applied on the DB side, before applying the length filter in the JVM.

@julgus julgus added the bug Something isn't working label Jul 6, 2023
@julgus julgus self-assigned this Jul 6, 2023
@julgus julgus added this to the 3.0.3 milestone Jul 6, 2023
@julgus
Copy link
Member Author

julgus commented Jul 7, 2023

This problem seems to stem from several places. Firstly, the criteria merger should not continue merging operations after encountering an anonymous lambda. At its current state, it merges any operations that contain a JPAStreamer predicate, regardless of the ordering in the pipeline.

After fixing that, a similar test still fails:

        try (StreamSupplier<Film> supplier = jpaStreamer.createStreamSupplier(Film.class)) {
            final List<String> actual = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .map(Film$.title)
                    .limit(10)
                    .collect(Collectors.toList());

            final List<String> expected = supplier.stream()
                    .filter(f -> f.getLength() > 120)
                    .sorted(Comparator.comparing(Film::getLength))
                    .map(Film::getTitle)
                    .filter(title -> title.startsWith("A"))
                    .limit(10)
                    .collect(Collectors.toList());

            assertEquals(expected, actual);
        }

In this case, reordering rules allow the first Stream to be optimized as:

            final List<String> actual = supplier.stream()
                    .sorted(Film$.length)
                    .filter(Film$.title.startsWith("A"))
                    .filter(f -> f.getLength() > 120)
                    .map(Film$.title)
                    .limit(10)
                    .collect(Collectors.toList());

This means we can merge the sort and the filter on the title into the query without any issues. However, the limit is also applied on the DB side, limiting the results to 10 before the filter length > 120 is executed. Due to the sorting operation, the 10 shortest films are returned and none of them will match the criteria length > 120.

@julgus
Copy link
Member Author

julgus commented Jul 7, 2023

Another example is this test:

            final List<String> actual = supplier.stream()
                    .filter(Film$.title.startsWith("A"))
                    .skip(10)
                    .filter(f -> f.getLength() > 120)
                    .sorted(Film$.length)
                    .limit(10)
                    .map(Film$.title)
                    .collect(Collectors.toList());

Here the SORTED is merged even though the SKIP precedes it. As SQL always applies SKIPS and LIMITS to the end of all other operations, we cannot merge this sort as it will change what entities are skipped.

@julgus julgus changed the title Incorrect reordering of anonymous lambdas Merger should only apply limits under certain conditions Jul 10, 2023
@julgus julgus closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant