-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent "invalid association" warnings at compile time #4293
Comments
Thank you for the report. We can leave it open for a while to see if anyone has run into something similar. |
I noticed the exact same thing! 👋 Asked about it on Slack a few weeks ago, but the discussion unfortunately didn't lead to any conclusion other than "wat": https://elixir-lang.slack.com/archives/C087B66BY/p1695369972353739 I'm on Elixir 1.15.4 using Ecto 3.10.3. It started a few months ago, I would say - hard to pinpoint since it doesn't always happen. I noticed that if the warning occurs, it's always in the file for the schema with the At one point, I was able to reproduce the issue at will - according to my notes, simply running When firing up IEx, I noticed:
I.e. for some obscure reason, the function didn't seem to be exported - yet, looking at the schema, it exposes plenty of fields and indeed, checking for whether it's exported now suddenly returns true. |
The function_exported? behaviour is expected and documented: https://hexdocs.pm/elixir/Kernel.html#function_exported?/3 I don't think it is the root cause here, because Ecto.Association.ensure_compiled makes sure the module is loaded upfront. |
Ah, pity -- is there anything else I could try? I can reproduce it fairly easily by compiling repeatedly until I get a warning, via #!/bin/sh
while true; do
echo "[`date`] Compiling..."
mix compile --force 2>&1 | grep warning && exit 1
done ...which usually takes about 10 runs or so. Once the warning is triggered, I can just run |
Before returning |
I tried using https://github.com/hrzndhrn/beam_file to compare the decompiled code in both the good and the bad (i.e. when a warning is printed) states; alas, the decompiled Elixir code is mostly the same except for some One other bit which maybe helps: once I get to the faulty state by running |
I debugged this a bit further and noticed that I can reproduce this with Elixir 1.15 only, i.e. Elixir 1.15.0 occasionally triggers the warning but Elixir 1.14.5 never does. |
does it happen on v1.15.6? |
Yes, v1.15.6 triggers it as well (I tried it locally, this is also what the OP used). I just bisected https://github.com/elixir-lang/elixir (very pleased to see how easy it is to hack the compiler!), it appears elixir-lang/elixir@58b45e9 is the commit which triggers this. |
Are those modules coming from a dependency or something? Are you using an umbrella? |
The modules are part of the main application, not coming from a dependency. I'm not using an umbrella. |
Can you please run this inside
You can see a list if paths. In my case, it is Livebook, so you see "livebook/_build/dev/lib/livebook/ebin". But you see |
And can you debug the Ecto code and add |
Yes, my application's ebin appears to have
In case the warning is printed, the output is slightly different: the
|
Yeah, so we should be finding all modules defined within your application. :S |
Any chance you can isolate it in a regular app or our example app in this repo? |
I think I might have trimmed down the real application far enough to exhibit the issue. In fact, now it always triggers the problem for me -- but just with Elixir 1.15. See @bkowalk 's reduced example in #4293 (comment) for a test case to reproduce the issue. Just
(The |
What puzzles me a little bit is that the warning now occurs always with Elixir 1.15, it's no longer intermittent. I hope I didn't reduce the real application too much, introducing a stupid mistake along the way... at least it's still the case that Elixir 1.14.5 compiles this without warnings. |
Wow, just catching up - thanks for running with this, @frerich! Confirmed that I'm able to see the issue on that tarball as well--amazing work reproducing! Compiling on an M2 macbook pro, if that makes a difference. Interestingly, I see the warning immediately every time if I kill the |
I think I've trimmed down the necessary schemas for the error in your example, @frerich. Looks like we're failing just with this attached set. It doesn't seem like any more can be removed and still reliably create the error. Seems like there may just be a complicated compilation chain. There are circular references in the schemas, but that's sort of expected in a However, it also weirdly disappears if you remove the link from Medic to Workplace, or (for who knows what reason) if you remove the |
Awesome job there @bkowalk - I can confirm that your last simplified test case triggers the issue for me, too:
|
Noticed one more interesting thing - if I go into
|
In our codebase, btw, I'll add that our most common error is I'm assuming here, btw, that these aren't real deadlocks - seems like they work themselves out if you ignore the warning and just go on your way. But rather just a temporary deadlocked state where the compiler is still working out its course of action or something. |
Awesome job @bkowalk in isolating it. Keep in mind that, in your example, you do have a warning that is accurate, but another one that indeed is a false positive. I have good news and bad news: I have fixed this in Elixir main but, given it changes the compiler, I am not yet comfortable with backporting it to v1.15. While I have confirmed it indeed no longer reproduces the issue for your example, I am not sure if that's generally true. If you could try elixir main, it would be fantastic. Thank you! ❤️ |
Using elixir main, I can no longer reproduce the issue on my real application. Excellent! 🥳 |
Same here. I see the issue after 5-10 recompiles on 1.15.4, but haven't seen it after 50 on 1.16.0-dev. Thanks for the fix, @josevalim! And thanks for finding a way to reproduce, @frerich! |
Not sure if this will help anyone else, but we noticed this issue when we had multiple Ecto schemas defined in the same file that were all assocs on another schema. defmodule MD.Scheduling.AppointmentStatusHistory do
use Ecto.Schema
schema "appointment_status_history" do
field(:status, :string)
field(:changed_at, :utc_datetime_usec)
belongs_to(:changed_by, MD.Accounts.User)
belongs_to(:appointment, MD.Scheduling.Appointment)
end
end
defmodule MD.Scheduling.AppointmentTimeHistory do
use Ecto.Schema
schema "appointment_time_history" do
field(:start, :utc_datetime)
field(:end, :utc_datetime)
field(:changed_at, :utc_datetime_usec)
belongs_to(:changed_by, MD.Accounts.User)
belongs_to(:appointment, MD.Scheduling.Appointment)
end
end
defmodule MD.Scheduling.AppointmentVisitTypeHistory do
use Ecto.Schema
schema "appointment_visit_type_history" do
field(:visit_type_id, :string)
field(:changed_at, :utc_datetime_usec)
belongs_to(:changed_by, MD.Accounts.User)
belongs_to(:appointment, MD.Scheduling.Appointment)
end
end You can see the warning in the log below. Once I moved these out into their own individual files, the warnings were all resolved! Hoping this helps someone else still on 1.15! |
We're still seeing this on Elixir 1.16.0 :( Did the fix not end up making it in @josevalim ? |
I don't believe it was reverted. You can try running the reproducer above and see if it errors or not on Elixir v1.16.0. If it does not error, you have a separate case and we need a way to reproduce it. |
Sorry, been slow getting back to this. Yes it does error on Elixir 1.16.0. Interestingly, I can't get it to fail normally, but it consistently fails in Docker and our pipelines, even with a |
I'm also experiencing this issue in elixir 1.16.0, cc @josevalim @Billzabob |
I am also still seeing this error somewhat sporadically with Elixir 1.16.0 and Ecto 3.11.2. It seems to come and go depending on the state of the code - once it's happening, it is consistent (e.g., repeated The most recent case I've been able to isolate is that I'm introducing an enum where the type values of the enum are pulled from a different module, e.g.,
Adding the indicated line is giving me "invalid association I could define |
There is definitely a bug, we just need a mechanism to reproduce it. Btw, you can try |
I've been experiencing this issue repeatedly - but only on CircleCI. We're on 1.16.2 and OTP 26. I just fixed it, going off Jose's So by limiting the number of schedulers ( I tried locally to see if I could reproduce the inverse locally, by telling Erlang I have like 48 cores ( For reference on our specific situation: OpenFn/lightning#2028 |
I faced the same issue i was able to have a workaround by requiring the ecto schema using require to force the compilation of the associated schema before the schema which is complaining
|
@himangshuj if I try that we get a TON of errors about deadlocking, which maybe points to the actual issue going on? This is the original warning we've been getting:
So I added Which leads to dozens of errors that look like this:
|
could be that, but for me it was alphabetical order of assocs, I did not have any deadlock. |
Curious if any of you have |
I ran into this on Elixir 1.16.1. This was a weird one. It failed on CI but was difficult to reproduce locally. Eventually, I was able to reproduce it using:
Even still, it would only fail once every half dozen times or so. Anyway, what we had looked essentially like this: defmodule MyApp.Blog.Post do
use Ecto.Schema
alias MyApp.Blog.Comment
@statuses [:draft, :live]
schema "posts" do
field :status, Ecto.Enum, statuses()
has_many :comments, Comment
end
def statuses, do: @statuses
end
defmodule MyApp.Blog.Comment do
use Ecto.Schema
alias MyApp.Blog.Post
schema "posts" do
field :status, Ecto.Enum, Post.statuses()
belongs_to :post, Post
end
end I'm guessing this created a compile loop. Moving the |
I'm on Elixir |
We can see this issue every
Before the sleep the function_exported?(queryable, :schema, 2) is returning false Follow up: |
We have the same issue when upgrading from Elixir Our problematic schemas are very similar to what's described here #4293 (comment) defmodule ProductTypeLocationTax do
defmodule City do
use Ecto.Schema
import Ecto.Changeset
@type t :: %__MODULE__{}
@primary_key false
schema "product_types_city_taxed_locations" do
belongs_to(:product_type, ProductType)
belongs_to(:location, Location)
timestamps()
end
def changeset(%__MODULE__{} = pl, attrs \\ %{}) do
pl
|> cast(attrs, [:product_type_id, :location_id])
|> unique_constraint([:product_type_id, :location_id])
end
end
defmodule State do
use Ecto.Schema
import Ecto.Changeset
@type t :: %__MODULE__{}
@primary_key false
schema "product_types_state_taxed_locations" do
belongs_to(:product_type, ProductType)
belongs_to(:location, Location)
timestamps()
end
def changeset(%__MODULE__{} = pl, attrs \\ %{}) do
pl
|> cast(attrs, [:product_type_id, :location_id])
|> unique_constraint([:product_type_id, :location_id])
end
end
defmodule County do
use Ecto.Schema
import Ecto.Changeset
@type t :: %__MODULE__{}
@primary_key false
schema "product_types_county_taxed_locations" do
belongs_to(:product_type, ProductType)
belongs_to(:location, Location)
timestamps()
end
def changeset(%__MODULE__{} = pl, attrs \\ %{}) do
pl
|> cast(attrs, [:product_type_id, :location_id])
|> unique_constraint([:product_type_id, :location_id])
end
end
end It's happening consistently in our project, but trying to reproduce this issue in a standalone project yields no warnings. I've managed to work around the issue by extracting them to their own files. The other thing that worked as workaround is adding loads of Code.ensure_compiled!(ProductCategoryLocationTax.City)
Code.ensure_compiled!(ProductCategoryLocationTax.County)
Code.ensure_compiled!(ProductCategoryLocationTax.State) |
I ran into this issue as well @stuartc 's solution of using the flag |
This warning suddenly appeared in our project as well. Elixir 1.17.2 |
I pushed a PR that should fix this error, by using the relevant Elixir callback for this kind of check: #4552 - please give it a try! |
That PR fixed it for us. Thanks so much! |
Elixir version
1.15.6
Database and Version
PostgreSQL 15.4
Ecto Versions
3.10.3
Database Adapter and Versions (postgrex, myxql, etc)
Postgrex
Current behavior
For a few months now, we've been seeing the following warning show up in our logs intermittently when compiling or running
mix test
:warning: invalid association `family` in schema Core.UserDevices.UserDevice: associated module Core.Families.Family is not an Ecto schema lib/core/user_devices/user_device.ex:1: Core.UserDevices.UserDevice (module)
Sometimes there are multiple warnings for a few schemas, sometimes none. The subject of their complaint is often this Family module, though it sometimes varies and can apply to any one of our schemas. I've traced the source of the message to the
after_compile_validation
functions inassociation.ex
in Ecto, where it seems that for some reason, Elixir doesn't believe the Family module has__schema__
exposed. It's definitely working as expected, though, and the app is functioning fine. So this warning seems to be a false alarm.I assume that something strange is going on here with compilation order, but am unsure. Since it's intermittent, I don't have a simple code example that can reproduce the issue, but I saw this past issue that seems very similar, as well as this recent change one call upstream from our issue and am crossing my fingers that others may have reported something similar recently, resulting in this change? Here's a simplified version of our Family schema that is showing the issue (though as I mentioned, any of our Ecto schemas at varying times will sometimes show up, seemingly depending on compile order).
The other schemas referencing (in the error above, UserDevice) simply have an alias to Core.Families.Family and then reference it with
belongs_to :family, Family
. Only things of note are that our schemas often use the LetMe library for authorization, and TypedEctoSchema to generate types for specs. Neither seem related since removing them from the schemas in the warning doesn’t change the warning, but worth mentioning just in case since I've never tried entirely pulling them from our codebase!If this isn't enough information to confirm a bug since I'm unable to reproduce, feel free to close--I definitely recognize this isn't a support forum and I've come with some vague details! But if this is ringing any bells we'd love to help track down what's making Ecto mad! (And to get our logs happy again)!
Expected behavior
Not seeing a warning for a thing that's a functioning Ecto schema!
The text was updated successfully, but these errors were encountered: