Implement graceful degradation for feature extractor #428

wbwakeman · 2020-06-03T00:33:47Z

Determine which strategy to use based on issue #398
(whole bunch of decorators, or rework computation graph)
update documentation

Acceptance criteria: able to run "failed" cells from June release set to generate NWB files with spikes via ipfx run_pipeline
And report how it failed

wbwakeman · 2020-06-03T15:12:27Z

Hoping this will resolve #278, #324, #330, #394

NileGraddis · 2020-06-08T20:56:25Z

Hey @gouwens and @tmchartrand,

We need to make a decision about the strategy to take when resolving these issues. This decision is a choice between:

injecting error handling throughout the existing feature extraction & plotting "glue" code
registering the feature extraction & plotting code to a computation graph and handling errors/inapplicable features through that graph.

In both cases, we would:

avoid changing the actual plotting and feature extraction code. The changes would only be to the glue code that runs feature extraction and plotting on the sweepset/dataset-scale.
avoid removing any existing functionality, though we might deprecate some.

Overall, I think that option 2 is a cleaner and more extensible solution. On the other hand, option 1 does not involve substantial architectural change to the current system.

What do you guys think? Are there many users who have written extensions depending on the existing glue code?

For more details and some code samples, please see this writeup

gouwens · 2020-06-08T22:07:49Z

Thanks for the write-up. I'm not opposed to option 2 right off the bat. But I am not 100% sure I'm seeing the benefits right away. As you say in the write-up, right now we are implementing that graph as coded in those various feature processing scripts. I'm not sure how adding another layer of abstraction (the Node/Graph classes) is going to help. I know it's a simple example, but as written, I might have no idea what the graph computation is doing because you have to go back to all the Node class definitions to see which functions are actually being called. You do name them similarly, but it kind of seems like extra code for no particular reason? Like, why not just call the functions and do the validation/error as needed interspersed with the calls?

Probably some of that could be fixed - like you could have a more general node class that you pass the function to as an argument (so it's clearer when you're putting the graph together what you're doing). But really all you're doing then is keeping track of success/failed status. Maybe this is the best way of tracking that status, though - I'm not sure.

Another thing I worry about is handling more complex inputs and outputs. Some functions may return tuples of values, and later functions that depend on them may only use some of those values. I think this system could end up hiding some of what is actually being used if everything has to be passed as "value" values.

I wonder if the real issue is that the pipeline code just needs to be reorganized/refactored. I think the pipeline code was originally written to (1) calculate all the values we need from a cell to publish it and (2) slam those results in a database that expects everything to be there. That's why it's currently monolithic and succeeds or fails in an all-or-none way. Option 2 is one way of refactoring it that would probably force some clean-up of the logic (which is a benefit), but I'm worried about adding boilerplate/overhead that'll make it harder to maintain. And maybe you could have gotten most of those benefits by refactoring in another way?

Regarding code that "depends on the existing glue code," do you mind specifying which code you consider glue? Like, I don't think I really call extract_cell_features directly, but I very frequently use the objects like LongSquareAnalysis used within that function. Those objects are kind of "glue" code themselves, depending on how you think about it.

tmchartrand · 2020-06-08T22:53:16Z

Thanks for starting on this @NileGraddis , and for reaching out. I mostly agree with what Nathan said above. I'd also lean towards the first option, away from extra abstraction. Definitely agree that some refactoring is probably in order either way, and could help the first option come out more cleanly.
Even though there certainly are a few levels in the existing computation graph, I don't see supporting that complexity as too important for future developments. Even capturing the full chain of failures in the existing nested computations doesn't seem that critical, although it would certainly be nice ( I could imagine a higher-level feature with a failure tag/message that simply says "failed due to failure in nested function X" or "failed due to sweep-level failures").
I don't think I have or know of any existing code that would be too disrupted by changes in the "glue" either way, although I agree that that's certainly hard to cleanly distinguish in the existing code. I'll try to think in a bit more depth about these solutions and let you know if I have more thoughts, or would be glad to join a brief call at some point soon if that's helpful.

gouwens · 2020-06-08T22:55:25Z

Yeah, I'd also be happy to talk about it on a call as per @tmchartrand 's suggestion, if that'd be easier.

NileGraddis · 2020-06-08T22:56:31Z

alright, I'll go ahead and schedule that call (and thank you both for the very fair feedback).

NileGraddis · 2020-06-09T00:01:19Z

Also, one thing I'd like to point out (doing so here because it requires a code sample), is that we can handle boilerplate removal and compound outputs without too much trouble in option 2. Like this, for instance.

Of course, this does not address the larger concerns you've raised about straightforwardness, readability, and the preservation of the existing "glue" interfaces.

NileGraddis · 2020-06-15T16:30:41Z

Notes from meeting call:

Nathan, Tom mainly using stimulus analysis classes, less data set feature extractor & similar "plot everything" code
need to check what components Luke C & Stephanie S are calling
it would be useful to be able to output more intermediate results, such as info about specialized sweeps
more specific error messages in feature extraction might help when writing specialized qc criteria for newer experiments
it would be helpful to have a full example of option 2, calculating real features on real data

Overall, it seems like there is an intermediate solution where we solve the problem at the stimulus type level, but don't dig into the stimulus analysis classes. We might apply either method to this intermediate solution.

wbwakeman · 2020-06-19T20:19:14Z

This is a file that fails QC, but for which we want to generate a NWB2 file with spikes. Current system does not create an output.nwb file

"/allen/programs/celltypes/production/mousecelltypes/prod968/Ephys_Roi_Result_777417625/nwb2_Vip-IRES-Cre;Ai14-424614.04.01.01.nwb"

wbwakeman added this to the Marmot 2020-06-16 milestone Jun 3, 2020

MatthewAitken mentioned this issue Jun 15, 2020

428 Implements graceful degradation for feature extractor #449

Merged

12 tasks

wbwakeman modified the milestones: Marmot 2020-06-16, Marmot 2020-06-30 Jun 16, 2020

MatthewAitken self-assigned this Jun 22, 2020

wbwakeman mentioned this issue Jun 29, 2020

GH-448 Add data access page #450

Merged

3 tasks

wbwakeman closed this as completed Jun 30, 2020

wbwakeman mentioned this issue Sep 1, 2020

Document feature extraction graceful degradation #472

Open

1 task

sgratiy mentioned this issue May 12, 2021

Failed and dropped sweeps are being used in Feature Extraction #516

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement graceful degradation for feature extractor #428

Implement graceful degradation for feature extractor #428

wbwakeman commented Jun 3, 2020 •

edited

Loading

wbwakeman commented Jun 3, 2020

NileGraddis commented Jun 8, 2020

gouwens commented Jun 8, 2020

tmchartrand commented Jun 8, 2020

gouwens commented Jun 8, 2020

NileGraddis commented Jun 8, 2020

NileGraddis commented Jun 9, 2020

NileGraddis commented Jun 15, 2020 •

edited

Loading

wbwakeman commented Jun 19, 2020

Implement graceful degradation for feature extractor #428

Implement graceful degradation for feature extractor #428

Comments

wbwakeman commented Jun 3, 2020 • edited Loading

wbwakeman commented Jun 3, 2020

NileGraddis commented Jun 8, 2020

gouwens commented Jun 8, 2020

tmchartrand commented Jun 8, 2020

gouwens commented Jun 8, 2020

NileGraddis commented Jun 8, 2020

NileGraddis commented Jun 9, 2020

NileGraddis commented Jun 15, 2020 • edited Loading

wbwakeman commented Jun 19, 2020

wbwakeman commented Jun 3, 2020 •

edited

Loading

NileGraddis commented Jun 15, 2020 •

edited

Loading