Filtering noise in pose estimation #1739

antortjim · 2024-04-08T13:52:31Z

antortjim
Apr 8, 2024

Dear SLEAP devs

I have noticed not just SLEAP, but in general any object detector deployed on video data where each frame is processed independently (which I think is SLEAP's case) tends to produce this artfact where the selected pixel whose x,y coordinates describe the object's position swaps between a few equally good pixels. For example, see video

jitter.mp4

And the x,y coordinates of for example the thorax over time (it's not exactly the same time window as in the video, but you get the idea).

Many body parts coordinates 'jitter', but note that it is usually only between two (or maybe 3) points.
I know noise in pose estimates a well-known issue which is addressed in many studies 1, 2. Usually, some combination of Gaussian, mean and median filters is used, which in a nutshell (in my understanding) try to separate signal from noise by comparing neighboring frames to each other, while at the same time losing some signal. I.e. the filtering step refines the pose estimates performed on a frame-by-frame basis by using information available in over-time-neighboring estimates.

I wanted to know if the developers have any ideas about which filters fit the best to different noise patterns. Moreover, given the observation that in my case at least, this noise follows a very particular pattern (the jitter occurs between the same two or max 3 values of x and y when each is looked independently), I came up with this filter:

import numpy as np
def encode(in_list):
    """
    Given a sequence of states
    return a list where every element is a tuple
    Each tuple has structure (state, state_duration)
    There is a tuple for each bout of a state
    """
    # Handle empty list first.

    if not in_list:
        return []

    # Init output list so that first element reflect first input item.

    out_list = [(in_list[0], 1)]

    # Then process all other items in sequence.

    for item in in_list[1:]:
        # If same as last, up count, otherwise new element with count 1.

        if item == out_list[-1][0]:
            out_list[-1] = (item, out_list[-1][1] + 1)
        else:
            out_list.append((item, 1))

    return out_list
    
def one_pass_filter_1d(data):
    """
    Overwrite stationary data which is surrounded by two bouts of stationary data where the coordinate is the same
    """
    
    encoding=encode(data)
    filtered_encoding=[encoding[0]]
    
    for pos in tqdm(range(1, len(encoding))):
        length=encoding[pos][1]
        val=encoding[pos][0]
    
        if (pos==len(encoding)-1):
            filtered_encoding.append((val, length))
            # end of time series
            break
            
        if filtered_encoding[pos-1][0]==encoding[pos+1][0]:
            val=encoding[pos-1][0]
        
        filtered_encoding.append((val, length))

    filtered_data=[]
    for val, length in filtered_encoding:
        filtered_data.extend([val,]*length)
        
    filtered_data=np.array(filtered_data)
    return filtered_data

It performs a RLE (Run Length Encoding) of the timeseries for X and Y separately, and overwrites bouts of one value 'foo' that are surrounded by two bouts of another value 'bar', when these two latter bouts have the same value.

For example:

39,38,38,39,40,40,40,40,40,41,41,41,41,40,40,40,40,40,41,41,41,40,40,40,41,41,41,40,40,40,40

becomes

39,38,38,39,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40

(the bouts of 41 surrounded by bouts of 40 are overwritten with 40)

It has the advantage that it is non-parametric (no need to select a window size, and less risk of removing subtle movements), and also robust to outliers (unlike the mean filter for example). It can be applied twice, to cover scenarios where the jitter is between 3 values instead of just 2.
I tried testing it in my own time series and I compared its performance to custom implementations of the filters described in the works I linked, and this filter somehow outperformed those.

What do you think of this filter? Does it make sense or am I getting good results just because I implemented the filters proposed in other works wrong? filter=0 is the raw data (no filter has been run). filter=1 and filter=2 represent one and two runs of this filter I proposed. filter=grace and filter=mehmet represent the output of filtering the raw data with the filters I implemented based on 1 and 2 respectively.

For reproducibility, please find the Jupyter notebook and .csv files attached for the thorax, middle right and middle leg data. Only numpy, pandas, tqdm and plotly.express are needed to run it.

pose_filter_noise.zip

Cheers,

Antonio

talmo · 2024-04-10T00:37:32Z

talmo
Apr 10, 2024
Maintainer

Thanks @antortjim, this is a great analysis and a very much needed step in postprocessing!

We've shied away from adding more postprocessing routines since for the multi-animal case, they need to happen after identity proofreading, so it wouldn't make sense to tack it onto sleap-track. But it's definitely the case that you'll want to do something like this when you have nice contiguous tracks.

For analysis-related tooling, I'd also recommend checking out the movement package from @niksirbi and co., which is a great addition to the ecosystem of tools that operate downstream of SLEAP. In particular, they've recently added some functionality for filtering and imputation to their examples.

In terms of the science here: those are great references and I'd definitely advise following existing protocols where possible, regardless of absolute correctness.

It's a toughie frankly, and it's not clear what the best approach is given how little we really know about pose estimation noise structure. It's definitely the case that we suffer from both the issue you described (having multiple "correct" solutions) as well as quantization error that emerges from peak finding on confidence maps. Here's a notebook that explains the theory behind this and lets you explore trade-offs.

Even knowing where the errors come from, it's not obvious which filtering approach is best. Traditionally what we've done is to apply a median filter with window size of 3-5 to get rid of point jumps and high frequency quantization errors, followed by a wider Gaussian or Savitzky-Golay filter (though see this for a counterpoint) for continuous smoothness.

Recent work in unsupervised behavior segmentation (keypoint-moseq), deals with the pose noise problem in a creative way and imposes a probabilistic graphical model that attempts to reconstruct the "true" pose as part of the modeling pipeline. They even have a way to export those estimated poses, but it comes at the cost of having to train a whole model to do the syllable segmentation first, so a bit more of a roundabout solution, but one that is a bit more based on the data itself than using classical signal processing.

Regardless of the method you use, you may consider using spectral features such as the CWT or FFT on bouts of immobility to examine the frequency contents of different methods, which can give you a quantitative readout to complement the visualizations in the plots.

Hope this helps and let us know if you have any questions or ideas!

0 replies

niksirbi · 2024-04-10T08:51:32Z

niksirbi
Apr 10, 2024

This is a very interesting topic, thanks for tagging me!

Indeed, "jitter" is a problem with virtually all pose estimation frameworks, because of the predictions happening on a frame-by-frame basis, as @antortjim guessed. There are some approaches that take temporal continuity into account, e.g. LightningPose, but in general some sort of filtering/smoothing will be needed in most cases to post-process the pose tracks.

Over at movement, we are actively working on implementing a bunch of different filtering approaches. We have prioritised this for now, because there's little sense in doing downstream analyses (kinematics or behavioural segmentation) with "dirty" data.

So far, we've implemented a very simple filter that drops low-confidence points based on a threshold, but more useful stuff is underway. I will link some of our relevant open issues:

I expect us to ship some of the simpler ones, like the median and the Savitzky-Golay filter, within the next 2 months. Others, like the Kalman filter, are probably trickier and will take longer.

As @talmo said, there doesn't seem to be rigorous way to determine what the best approach is. Currently, I see no substitute for inspecting the data pre- and post-filter, and reporting a bunch of diagnostics. That's what we intend to do: provide a bunch of options for people to try out, and enable QC through visualisations and reports.

If anyone has more ideas about filters that would be good to implement, feel free to open issues (ideally with references to paper(s) or existing implementations, as Antonio has done here.

Another idea we've been toying with is to provide a "template" filter function, which people can use to implement their custom filters (like the one mentioned in this thread), so that they can "plug" them into movement. This would allow people to benefit from movement even if we haven't implemented their favourite filter (yet). That said, we'd have to consider how this would work technically, and it may be trickier than we think.

Niko

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering noise in pose estimation #1739

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Filtering noise in pose estimation #1739

antortjim Apr 8, 2024

Replies: 2 comments

talmo Apr 10, 2024 Maintainer

niksirbi Apr 10, 2024

antortjim
Apr 8, 2024

talmo
Apr 10, 2024
Maintainer

niksirbi
Apr 10, 2024