-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement graceful degradation for feature extractor #428
Comments
Hey @gouwens and @tmchartrand, We need to make a decision about the strategy to take when resolving these issues. This decision is a choice between:
In both cases, we would:
Overall, I think that option 2 is a cleaner and more extensible solution. On the other hand, option 1 does not involve substantial architectural change to the current system. What do you guys think? Are there many users who have written extensions depending on the existing glue code? For more details and some code samples, please see this writeup |
Thanks for the write-up. I'm not opposed to option 2 right off the bat. But I am not 100% sure I'm seeing the benefits right away. As you say in the write-up, right now we are implementing that graph as coded in those various feature processing scripts. I'm not sure how adding another layer of abstraction (the Node/Graph classes) is going to help. I know it's a simple example, but as written, I might have no idea what the graph computation is doing because you have to go back to all the Node class definitions to see which functions are actually being called. You do name them similarly, but it kind of seems like extra code for no particular reason? Like, why not just call the functions and do the validation/error as needed interspersed with the calls? Probably some of that could be fixed - like you could have a more general node class that you pass the function to as an argument (so it's clearer when you're putting the graph together what you're doing). But really all you're doing then is keeping track of success/failed status. Maybe this is the best way of tracking that status, though - I'm not sure. Another thing I worry about is handling more complex inputs and outputs. Some functions may return tuples of values, and later functions that depend on them may only use some of those values. I think this system could end up hiding some of what is actually being used if everything has to be passed as "value" values. I wonder if the real issue is that the pipeline code just needs to be reorganized/refactored. I think the pipeline code was originally written to (1) calculate all the values we need from a cell to publish it and (2) slam those results in a database that expects everything to be there. That's why it's currently monolithic and succeeds or fails in an all-or-none way. Option 2 is one way of refactoring it that would probably force some clean-up of the logic (which is a benefit), but I'm worried about adding boilerplate/overhead that'll make it harder to maintain. And maybe you could have gotten most of those benefits by refactoring in another way? Regarding code that "depends on the existing glue code," do you mind specifying which code you consider glue? Like, I don't think I really call |
Thanks for starting on this @NileGraddis , and for reaching out. I mostly agree with what Nathan said above. I'd also lean towards the first option, away from extra abstraction. Definitely agree that some refactoring is probably in order either way, and could help the first option come out more cleanly. |
Yeah, I'd also be happy to talk about it on a call as per @tmchartrand 's suggestion, if that'd be easier. |
alright, I'll go ahead and schedule that call (and thank you both for the very fair feedback). |
Also, one thing I'd like to point out (doing so here because it requires a code sample), is that we can handle boilerplate removal and compound outputs without too much trouble in option 2. Like this, for instance. Of course, this does not address the larger concerns you've raised about straightforwardness, readability, and the preservation of the existing "glue" interfaces. |
Notes from meeting call:
Overall, it seems like there is an intermediate solution where we solve the problem at the stimulus type level, but don't dig into the stimulus analysis classes. We might apply either method to this intermediate solution. |
This is a file that fails QC, but for which we want to generate a NWB2 file with spikes. Current system does not create an output.nwb file "/allen/programs/celltypes/production/mousecelltypes/prod968/Ephys_Roi_Result_777417625/nwb2_Vip-IRES-Cre;Ai14-424614.04.01.01.nwb" |
Determine which strategy to use based on issue #398
(whole bunch of decorators, or rework computation graph)
update documentation
Acceptance criteria: able to run "failed" cells from June release set to generate NWB files with spikes via ipfx run_pipeline
And report how it failed
The text was updated successfully, but these errors were encountered: