Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI against Rails 6-0-stable broken #1081

Closed
deivid-rodriguez opened this issue Oct 21, 2019 · 26 comments · Fixed by #1113
Closed

CI against Rails 6-0-stable broken #1081

deivid-rodriguez opened this issue Oct 21, 2019 · 26 comments · Fixed by #1113

Comments

@deivid-rodriguez
Copy link
Contributor

CI is broken against 6-0-stable since rails/rails@dd46d18.

Needs to be investigated and fixed.

@scarroll32
Copy link
Member

I'll try to look at this soon.

@scarroll32 scarroll32 self-assigned this Nov 11, 2019
@macfanatic
Copy link

macfanatic commented Dec 12, 2019

Anything we can do to help mitigate or debug this issue @seanfcarroll?

We have confirmed the issue remains with rails releases 6.0.1, 6.0.2.rc1 and 6.0.2.rc2.

We have some of our more complex search queries generating invalid SQL when attempting to update to rails 6.0.1.

@scarroll32
Copy link
Member

Thanks for the ping @macfanatic

Sorry I haven't had time to work on it. I will be able to look at it during the Christmas week. If you are interested in making a PR with any code investigations or failing tests for this specific problem, that could be helpful.

All the best 👍

@macfanatic
Copy link

Here is a rails bug script showing that with rails 6.0.0 the test passes, but when changing to activerecord 6.0.1 the test fails with an uncaught invalid SQL exception.

# frozen_string_literal: true
require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "activerecord", "6.0.0", require: "active_record"
  gem "ransack", "2.3.0"
  gem "sqlite3"
end

require "minitest/autorun"
require "logger"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :organizations, force: true do |t|
    t.string :name, null: false
    t.string :type, null: false
    t.timestamps
  end

  create_table :products, force: true do |t|
    t.string :name, null: false
    t.references :seller, foreign_key: { to_table: :organizations }, null: false
    t.references :manufacturer, foreign_key: { to_table: :organizations }, null: false
    t.timestamps
  end
end

class Organization < ActiveRecord::Base
  validates :name, presence: true
end

class Seller < Organization
  has_many :products
end

class Manufacturer < Organization
  has_many :products
end

class Product < ActiveRecord::Base
  belongs_to :seller
  belongs_to :manufacturer

  validates :name, presence: true
end

class BugTest < Minitest::Test
  def test_generated_sql_join

    manufacturer = Manufacturer.create!(name: 'ABC Toys, LLC')
    seller = Seller.create!(name: 'Amazon, Inc')
    product = Product.create!(name: 'Toy Ball', manufacturer: manufacturer, seller: seller)

    term = 'Toy'

    query = Product.includes(:manufacturer, :seller).ransack(name_or_seller_name_or_manufacturer_name_cont: term)
    assert_equal query.result.count, 1
  end
end

Screen Shot 2019-12-13 at 11 18 31 AM

@macfanatic
Copy link

Hi @seanfcarroll is there anything specific I could look at to help get this resolved? I'm new to the project so diving in and making a meaningful fix will be difficult, but I have multiple projects locked to rails 6.0.0 due to this issue (all ActiveAdmin related) so I'd love to get something figured out.

@scarroll32
Copy link
Member

Hi @macfanatic I don't have time to work on this, however I am looking for a developer to do it. There is some budget from open collective sponsorship to pay for it.

If Ransack is useful to you, please consider contributing some small amounts and then noting which issues you would like fixed.

@scarroll32
Copy link
Member

@ozzyaaron
Copy link

ozzyaaron commented Jan 13, 2020

I tried to look into this today - hopefully I will get to continue.

I'm only my findings here as other developers more experienced on Ransack might be able to point me in the right direction.

It seems that by the time you get to

relation = @object.where(viz.accept(search.base))
in Rails 6.0.0 it would leave the last JOIN with the wrong table alias (according to Rails code which appends _join to the aliases for join dependencies) and then the conditions generated by viz.accept(search.base) would still work.

To my thinking as eventually both should be resolved by Arel Nodes the table alias names should be the same, but they're obviously not.

I'm only so far as to be able to say that it appears that the result from @object.to_sql is different between Rails versions though the newer version generates more correct output as it appears to add the expected _join to the alias. What I plan to try to figure out now is why the table alias names are different between @object.to_sql and viz.accept(search.base). viz.accept(search.base) still generates Arel::Nodes::TableAlias nodes with @right set to a value without _join appended. @object.joins_values generates Arel::Nodes::TableAlias with @right values that do append _join.

The below are partial queries

In our App Rails 6.0.0 looks like

LEFT OUTER JOIN "wt_participant_applications" "wt_participant_applications_matches" ON "wt_participant_applications_matches"."id" = "participant_group_memberships_matches_join"."wt_participant_application_id"
WHERE ((("users"."name" ILIKE '%test%' OR users.email::text ILIKE '%test%') OR "wt_participant_applications_matches"."fmid" ILIKE '%test%') OR "wt_participant_applications_matches"."uuid" ILIKE '%test%') LIMIT 1

In Rails 6.0.2 we get

LEFT OUTER JOIN "wt_participant_applications" "wt_participant_applications_matches_join" ON "wt_participant_applications_matches_join"."id" = "participant_group_memberships_matches_join"."wt_participant_application_id" 
WHERE ((("users"."name" ILIKE '%test%' OR users.email::text ILIKE '%test%') OR "wt_participant_applications_matches"."fmid" ILIKE '%test%') OR "wt_participant_applications_matches"."uuid" ILIKE '%test%') LIMIT 1

@scarroll32
Copy link
Member

One place to look is the Rails 6 release notes. These in particular look like they could be a possible cause:

Ransack also has code added for Rails 6, inside Polyamorous.

I read this code mean that the Rails 6 Polyamorous code is specific to minor versions, so 6.0.0 should be the same for Ransack as 6.0.1.

 ar_version = ::ActiveRecord::VERSION::STRING[0,3]
  %w(join_association join_dependency reflection).each do |file|
    require "polyamorous/activerecord_#{ar_version}_ruby_2/#{file}"
  end
irb(main):003:0> ::ActiveRecord::VERSION::STRING
=> "5.2.3"
irb(main):004:0> ::ActiveRecord::VERSION::STRING[0,3]
=> "5.2"

Looking in the 6.0.1 release notes, there are quite a lot of changes for AM/AR

One final thing, when Polyamorous was absorbed into Ransack, the tests we left behind. These definitely need to be brought over and expanded for any 6.0 changes.

CC @mohdsameer

@deivid-rodriguez
Copy link
Contributor Author

deivid-rodriguez commented Jan 15, 2020

Not sure if you noticed, but I did bisect this, and included the specific change in Rails that caused this in the issue description.

@scarroll32
Copy link
Member

scarroll32 commented Jan 15, 2020

Thanks @deivid-rodriguez do you mean rails/rails@dd46d18 ?

@deivid-rodriguez
Copy link
Contributor Author

Yes.

@ozzyaaron
Copy link

@seanfcarroll @deivid-rodriguez FYI I have approval from my work to spend time about 3 days worth of my time over the next couple of weeks on this issue and just getting a better idea about how Ransack works in general. I'll quite likely get back to working on this issue tomorrow.

This will be our first trial at this type of thing for our team, but I hope we'll actually be dedicating more of our team's time to Ransack and other gems we use in 2020. Items like you discussed about Polyamorous and also I think some good developer documentation would be the types of plumbing and grunt work we'd like to help out with.

@scarroll32
Copy link
Member

scarroll32 commented Jan 16, 2020

@ozzyaaron wow that is completely awesome! Thank you to you and your employer!

I'm perfectly happy for you to work on any issue you want. Good developer documentation on Ransack internals would be really great as we need to find a way to lower to bar for the community to contribute.

Please ping me at any time if you need help or direction.

Welcome!

P.S. I've got some big ideas for the future of Ransack, and will put them on the table for community discussion. But first I think we need to get it stabilized and close out all the old issues (either fix or close).

@ozzyaaron
Copy link

Confirming removal of rails/rails@dd46d18#diff-bf6dd6226db3aab589916f09236881c7R1111-R1114 fixes my own test case anyway.

@seanfcarroll I'm just putting breakpoints at locations I think might be promising, I tried using something smarter like tapping_device but the shuffling of objects meant it was more useful just to see which methods were being called on things like search.context.object, search.context, search.base and so on where search = ARClass.ransack(...).

So far it seems that loosely there are two sides to building the query, building the necessary joins and then building the conditions and putting them together via a where call. Ransack would be responsible for setting up the joins which then seem to end up basically on search.context.object and I can get the joins_values there. Then I think it plays a part in setting up the where class to then pass to that relation. I see that when there is a call @object.where(viz.accept(search.base)) where @object is what I was referring to with search.context.object.

If I call search.context.object.joins_values there is a difference introduced by the linked Rails code that means that now the return is actually what I would probably expect. I think normally a JoinDependency should have _join added to the end of the table name. If anything I think that the Rails code actually produces joins_values that are more correct now.

The issue I'm trying to figure out is how does viz.accept(search.base) get access to the table alias information on search.context.object so that the two sides can be tied together? It seems to me like at some point when the conditions are being generated it should be accessing the table aliases to drive the appropriate table names? I'm not sure that there is enough state being passed to viz.accept?

At this stage, if you have time, are you able to give a quick run down on how viz.accept (and the various Node types in Arel and in Ransack) are meant to end up resolving to part of a query? If I have condition with attribute participants_name and value test where the table alias has been setup by ActiveRecord/Arel to be participants_matches_join how does that attribute work out that it should generate participants_matches_join.name = 'test'?

@malachewhiz
Copy link

@ozzyaaron did you manage to find a solution or know what needs to be done to fix this issue? We too are affected by this issue and are investigating a solution. Maybe we could share our resources to come up with a solution together?

@ozzyaaron
Copy link

ozzyaaron commented Feb 13, 2020

@malachewhiz TLDR; I am making slow progress on this over my weekends but have no idea when I would understand enough Ransack, ActiveRecord and AREL to properly fix the issue.

I know what Rails change caused the issue (detailed above by initial post) and I know where in Ransack the issue is coming from (I kind of outlined above) but I don't know how to fix it yet and I certainly don't understand any of the code or its intentions enough to feel confident in any fix I would make. As of last weekend I can say that for some reason when the AST (I guess) or AREL objects is built up for some reason there seems to be a difference between the object shared between what builds the joins or data collection part of the query and what builds the conditions/clauses part of the query. I'm still trying to track down why but there is a little piece of code that is given arguments one of which is join and if it is told that the Arel::Table is a join then it appends _join to it. For some reason this is sent through as false and thus the clauses use the non-join table name.

I have been spending a few hours over the last couple of weekends in my own time just trying to document what actually happens when you call Model.ransack(...).result so that I could get enough understanding to fix the issue confidently. I have a google doc that I've been building up and to be honest if I get some time this weekend and feel comfortable I might just share it here as it could be a good place for core devs to give guidance, tell me where I'm wrong/right and maybe even do some fleshing out of their own.

It has given me a lot of respect for people that work in these areas. I'm sure once you get familiar it becomes easier, like with any domain, but I really was hoping to just jump in and fix it pretty quickly. That has not happened :P

@malachewhiz
Copy link

@ozzyaaron thanks for your in-depth reply! We came to the same conclusion, that it would need a lot of effort to understand the inner-workings of all the gems in play here to make up a general fix. We ultimately patched it by adding this in an initializer (rails 6.0.2.1, ransack 2.3.2):

module Polyamorous
  module JoinDependencyPatch
    def base_klass; end
  end
end

Polyamorous::JoinDependency.prepend(Polyamorous::JoinDependencyPatch)

I don't know if that's the recommended approach, but it seems to work for us. Maybe it will help someone else with this issue (or to come up with a better solution).

@ozzyaaron
Copy link

ozzyaaron commented Feb 18, 2020

@malachewhiz as stated, I'm no pro here but I would probably steer clear of that type of patch as it is may have other unintended circumstances. Obviously up to you based on your own automated tests and whatnot.

Internally Polyamorous::JoinDependency is actually a reference to ActiveRecord::Associations::JoinDependency so you're modifying that class too. I think the patch could modify base_klass for ActiveRecord::Associations::JoinDependency in unknown ways going forward.

Just to confirm (for me, too)

irb(main):005:0> Polyamorous::JoinDependency.object_id
70275975348240
irb(main):006:0> ::ActiveRecord::Associations::JoinDependency.object_id
70275975348240
irb(main):007:0> ::ActiveRecord::Associations::JoinDependency == Polyamorous::JoinDependency
true

My understanding is that you have haven't exactly removed Polyamorous but you have effectively disabled some of its reflection abilities because it relies on base_klass to do this. It also I think would negate the Rails patch that caused this issue. I guess if the bug that caused the Rails patch doesn't effect you (it doesn't effect us) you're effectively trading one bug for another.

My understanding of the code so far is that if this were the patch to be made, we could patch the gem with a new version of Polyamorous::JoinDependencyExtensions for ActiveRecord::VERSION >= 6.0.1 with this code.

I decided to try your patch anyway and just as an FYI it doesn't fix the issue I am having with Rails 6.0.2.1 and Ransack 2.3.2.

Just as some further test data, if you don't mind, are you able to post an example of the relevant model structure and params to the ransack call that your patch fixed? Thanks!

@ozzyaaron
Copy link

ozzyaaron commented Feb 19, 2020

@seanfcarroll @deivid-rodriguez the more I have read the code the more I think that the change introduced into Rails that caused this issue in Ransack probably has unintended side-effects like we're seeing.

It changes the collection part of the query by adding _join to a table that didn't have it before and the PR asserts that it was meant to effect queries using string joins and eager loading with a use of references but doesn't check for either one before modifying joins destructively.

I have a patch for Ransack that duplicates the popped join in circumstances where the query has no includes or eager loads. I think it would fix this particular issue but raises its own obvious issues such as using eager loading/includes with Ransack.

module Polyamorous
  module RelationExtensions
    def build_joins(manager, joins, aliases)
      # Rails introduced a change in https://github.com/rails/rails/pull/37235/commits/783cafee4f3e1b1c1f8edf4ad915f66cf84e0552
      # that was intended to retain join order when developers were also using an eager load
      # with only string joins.
      #
      # I believe the change made to Rails core is overly broad.
      #
      # Given the intention was to fix a bug related to eager loading and string joins but the
      # change didn't check for either before modifying the joins array I believe we should re-push
      # the join if there isn't any eager load or includes calls.
      #
      if joins.last.is_a?(ActiveRecord::Associations::JoinDependency)
        if joins.last.base_klass == klass && references_values.empty?
          joins.push(joins.last)
        end
      end

      super
    end
  end
end

ActiveRecord::Relation.send(:prepend, Polyamorous::RelationExtensions)

I think being told why this approach is wrong might help me out in figuring out how to really fix it.

Are there any good resources to look into wrt Arel & ActiveRecord? I personally couldn't find anything past the very basics and quite a few of the ActiveRecord PRs seem really hastily written and merged like the one I @deivid-rodriguez referenced.

I'm still trying to find out how the attributes in Condition might be able to reliably discover the aliased table anyway, as I figure that even if there were a screw up in Rails, we should be able to discover which Arel::TableAlias to use via the Polyamorous::JoinDependency object that is setup by find_or_build_association.

Thanks again for any help!

EDIT: the above patch fixed some issues but not all. I still think the root fix will be to find a way for the attribute to find its proper table alias.

I changed to this. By swapping the joins the offending code never runs in Rails own build_joins but it still does not work.

      if joins.last.is_a?(ActiveRecord::Associations::JoinDependency)
        if joins.last.base_klass == klass && references_values.empty?
          joins[-1], joins[-2] = joins[-2], joins[-1]
        end
      end

@deivid-rodriguez
Copy link
Contributor Author

@ozzyaaron I'm sorry I currently have no time to dig into this, and I'm not even sure I would even succeed or make any progress.

But I want to THANK YOU so much for putting so much effort into fixing this. I really appreciate it 💜.

@ozzyaaron
Copy link

ozzyaaron commented Feb 21, 2020

TLDR, see below for a fix that works for us an I think is a little bit understandable (hopefully). 🤞

I'm not going to claim this is the nicest fix but I think it actually gets to the heart of the issue (more below).

I think in general there needs to be a better connection between Attribute and the actual join dependency or join source that was created for it. I have attempted to sniff the join sources table alias name and add a reference to Attribute. Then when the Condition finally evaluates out to the Arel::Attributes::Attribute object it will swizzle in the table's alias that arel.join_sources calculated when the join was setup by binding the attribute to the context.

As part of this work I actually found that the Rails commit created a JOIN order issue for us regardless of the use of Ransack. I think there are two issues here:

  • The fix to Rails was probably done without enough examples and wasn't tightly bound enough to the cases that were trying to be fixed
  • That being said Ransack (I believe) should still maintain a reference between the Attribute and the Join created to get access to that attribute/column as JOINS are the likely culprit when there is table aliasing. If you think about it, each attribute adds a join but it never tracks which join as far as I can see that it was responsible for. Strictly speaking the where clause at the end should be setup such that the attribute queries the table of the join that it created. If you have a ransack param associated_model_attr_a_or_associated_model_attr_b_or_associated_model_attr_c_eq: "test" which generates joins like OUTER JOIN associated_models ... OUTER JOIN associated_models AS associated_model_join ... OUTER JOIN associated_models AS associated_models_2 ... WHERE ... then the WHERE clause should be something like associated_models.attr_1 = 'test' OR associated_models_join.attr_b = 'test' OR associated_models_2.attr_c = 'test'

The 'fix' below works for us, and I think works for me conceptually as far as my understanding of the tree structure that Arel uses for a JOIN.

I've tried to comment in the code rather than go into further explanation here. As I've said numerous times, I am not a pro in any of these areas. I'd actually love to get some time with anybody that has deep ActiveRecord & Arel knowledge as I think there must be a better way to say use the actual JoinDependency object and do a search of the Arel AST later. I would think that there is a way for Attribute to later find the objects in the tree rather than setup a static reference when Attribute is created. That being said I think you'd still need to have a reference in Attribute for that anyway such as the JoinDependency or the association's reflection chain.

The code below passes the Ransack tests bar one that I'm not even sure is a valid failure. I think the failure is due to the JOIN order change introduced by the commit to Rails at question as I believe it re-orders how JOIN and default scope are applied. I also added some tests of my own and additions to the schema so that I could first reproduce the issue. It passed all of our Application's tests which are fairly extensive (~20k specs, ~1400 scenarios).

Just to be clear the below is part of an initializer in our App currently and passes all of our automated tests. I'd like to perhaps get feedback on this before I go about creating a PR for Ransack.

module Ransack
  module Nodes
    class Condition
      def build_attribute(name = nil, ransacker_args = [])
        Attribute.new(@context, name, ransacker_args).tap do |attribute|
          # Binding creates the join for the attribute if required
          @context.bind(attribute, attribute.name)

          # Ransack uses OUTER JOIN
          # The bind call above sets up the necessary JOINS to access the attribute above
          # The last JOIN will be to the table containing the column
          # We now have to dive into the AST for this OuterJoin object to see if it uses a
          #   table alias. If it does then we link the table alias to the attribute so that
          #   it can be used later when generating the condition clauses
          #
          if (join_source = @context.object.arel.join_sources.last).is_a?(Arel::Nodes::OuterJoin)
            # children.reverse to get the last table alias which should be to the table that we want the attribute for comparison
            # We only use nodes that inherit from Binary because these are all of the nodes
            #   that allow for comparison and could form part of the JOIN clause
            # Without being specific there may be a Grouping node to contain a default scope or
            #   other issues
            #
            # OuterJoin.right => Arel::Nodes::On
            # OuterJoin.right.expr => Arel::Nodes::And/Or
            # OuterJoin.right.expr.children => clauses for the JOIN (typically Arel::Nodes::Equality but could also be Arel::Nodes::In, etc)
            #
            # Just because of the way the JOIN AST works where the table you're joining to forms
            #   the right side of the graph (e.g. FROM tbl1 JOIN tbl2 ON tbl1.column = tbl2.column)
            #   it seems reasonable to get the table alias from the right side of the
            #   Arel::Nodes::Binary which in most cases is going to be a Arel::Nodes::Equality
            #
            # If the right side is a TableAlias then add that reference to the Attribute so it
            #   can be used when generating the clause later
            #
            join_to_table_clause = join_source.right.expr.children.detect { |node| node.is_a?(Arel::Nodes::Binary) }
            if (table_alias = join_to_table_clause.right.relation).is_a?(Arel::Nodes::TableAlias)
              attribute.source_table_alias = table_alias
            end
          end

          self.attributes << attribute if name.nil? || attribute.valid?
          if predicate && !negative?
            @context.lock_association(attribute.parent)
          end
        end
      end

      def attr_value_for_attribute(attr)
        if ActiveRecord::Base.connection.adapter_name == "PostgreSQL"
          if attr.source_table_alias.present? && attr.attr.relation.is_a?(Arel::Nodes::TableAlias)
            # Modify the relation for the attribute so that it will use the correct
            #   table alias
            attr.attr.relation.right = attr.source_table_alias.right
          end

          return attr.attr
        end

        predicate.case_insensitive ? attr.attr.lower : attr.attr
      rescue StandardError
        attr.attr
      end
    end

    class Attribute
      attr_accessor :source_table_alias
    end
  end
end

@varyonic
Copy link
Contributor

FWIW I tried this against our production code and found that join_to_table_clause.right could be an Arel::Nodes::BindParam. Will patch around that and see if it's our only issue.

@ozzyaaron
Copy link

ozzyaaron commented May 8, 2020

@varyonic same here now with Rails 6.0.3 wrt your finding about Arel::Nodes::BindParam. We've added a small additional check for this. Waiting on tests but I think they'll be fine.

The patch that closed this issue doesn't seem to fix the original issue @seanfcarroll. We can still reproduce the original error I gave above using Ransack master.

@deivid-rodriguez
Copy link
Contributor Author

Agreed.

As I commented in #1113 (comment), I believe the patch only adapted specs to expect the wrong thing, but didn't seem to fix the issue.

@nikajukic
Copy link

@deivid-rodriguez indeed. I've added two test examples here #1119 (comment) - one passing and one failing that prove your point and address the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants