-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dynamic determination of invalidated cache #15
Comments
You mean like have a dependency graph? I definitely thought that was a promising idea but Matt thought it would be a lot of work at the time... |
To be honest I assumed we had a DG, or the data to build one on the fly. I assumed that is what is needed to do lazy loading. I will go back to the code. How do we do lazy loading? When we add data to orca, store a timestamp (or monatomic number) in a _CACHE_TIME. |
No there's no DG - lazy evaluation happens when an item gets injected into a function. If it's in the cache it doesn't get re-evaluated. The user has to invalidate the cache manually, which is usually done every iteration, and is why my other issue about the cache is so important. It's not that we didn't think DG was a good idea, just that Matt never got to it. |
Does the algorithm I suggested make sense? Do you think it would improve things? As always just starting a discussion, and willing to help implement any fix. |
I definitely agree that building a DF and invalidating the cache appropriately when something gets updated would be awesome. The problem is that your dependency graph is usually built based on what gets injected into the function, but everything that gets injected is not necessarily used. So like when you inject the parcels table which has 20 columns, you only update one column in the parcels table you want to invalidate only the things that depend on that column, but you're injecting the whole table. I see no obvious way to solve this other than requiring the user to say which columns a function depends on explicitly, but then we're almost as prone to error as we were in the first place. Thoughts? |
It can also be subverted by using Perhaps this algorithm can be a diagnostic tool? Still thinking... |
Right - that's the best I can come up with too... orca knows when it's inside that method, and it can capture any calls to get_table or |
I would say that stating specific column dependencies, while cumbersome, is still preferable to having to manually clear the cache within the steps. But maybe this could be automated in some way, like looking through the contents of a yaml file or configuration dictionary rather than being declared in the decorator? |
@bridwell that is my inclination as well, to be explicit about dependencies. A bit more work up front, but with good payoff in terms of being able to address challenges like identifying dynamically when to recompute based on an updated dependency. |
can we keep track of when we generate a orca column and use that to determine when a orca table needs to be re-calculated.
The text was updated successfully, but these errors were encountered: