-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immutable search spaces #371
Comments
Please mention your suggested solution briefly In particular I'm curious whether it would solve the computational issue mentioned in the second last point. In my view this is a real problem, the test depends on interpretation and intent. After having given several people the metadata approach for workarounds, no one has described the approach as |
Alright, can do. Before, quick comment on what you said:
It is inelegant and error-prone for several reasons:
Where ever there is room for error, people WILL tap into the problems. We have seen this countless times will less severe things like outdated docs etc. I think enough said 🙃 |
Bad SolutionHypothetically speaking, there is a quick fix to the timing issues associated with adding campaign data. But I absolutely discourage going that path: We could simply temporarily cache the added data in the campaign and then only flush the cache (i.e. do the actual search space marking) as soon as it is needed, i.e. when you trigger the next recommendation. A lazy data adding if you will. However, apart from having to maintain another cache and the additional logic involved, this makes the other problems I've described even worse, because you effectively shatter the metadata information into pieces an distribute them across two objects: part of it sits in the campaign until the cache is flushed, the rest sits in search space. So what used to be stored in a single location, could then even spread across several campaigns, making the coupling problem much worse (i.e. what I showed in the example). So let's better quickly forget about this 😬 Ideal SolutionHere the potential solution I could imagine. For me it solves all the problems and has only one potential downside, namely computational burden. But the latter is only a potential danger and we'd need to test. In fact, I think that it might not be a problem at all (and overall run even faster than what we have now). So let's forget about computational cost for the moment and only focus on layout, we'll talk about the other part later. The idea would be to just kick the
The consequence is that the campaign fully controls its metadata, and this additional enables more flexibility like "undoing" operations like The "drawback" is now that basically we need to merge this metadata with the search space before every recommendation. But I don't expect this to become a real bottleneck since:
|
Your rant has valid and invalid points (eg the attribute is defacto not private nor should it be according to the current logic that explicitly allows modification, hence one shouldnt argue based on it being private). But I'm not interested in solving theoretical problems of pandas dfs. After all it does not address my initial argument: It also misses the important point: Also remember, that once Furthermore, we already have evidence that the current method might be computationally demanding (#344 (comment)). Based on your proposal your solution seems equally or more demanding. Solve one problem, worsen another, not the way to go. Consider my proposal from below, maybe additonally we can move the involved dfs optionally to polars and also improve merge speed. Beyond that I see the problem of the current searchspace mutability and it should be addressed. Did you consider following much simpler solution that should also solve the searchspace mutability: have a |
Let me add my opinion as well.
|
I currently see a larger set of problems that can all be traced back to the fact that our current
SubspaceDiscrete
class is mutable and we should consider making it immutable in the future. Here, I would only like to highlight a few of the problems so that they are not forgotten. We can then deal with them in autumn when I'm back (I already have a potential solution in mind but more on that later).One of the biggest problems – since it can easily lead to unintended behavior / silent bugs – is that people might not properly take into account the statefulness of the problem, perhaps because they are simply unaware. Even if they are aware, they can easily forget. I've seen it already many times, in fact.
Simple example
Consider someone wants to run several studies on the same basic setup, e.g. testing two recommenders. A common thing that people might do:
At first glance, seems legit, but is wrong. Gives:
because the second campaign uses the metadata that was modified during the first run. The problem can be easily avoided by making the space immutable, just like all other ingredients we offer for specifying campaigns. Of course, this requires a different approach to metadata handling.
After this illustrating case, let me just summarize all issues I currently see. There might be more, so I'll update the list whenever I spot another one:
The text was updated successfully, but these errors were encountered: