Pretrained model for bulk low-quality data #236

avantikalal · 2021-04-05T16:00:58Z

Hi @avantikalal, in cases where noisy samples have coverage >= that of the clean samples, should users always forego training and use your pretrained model, nvidia:atac_bulk_lowqual_20m_20m?

For example:

If the pretrained model is not successful in reducing noise, are there training parameters that should be considered when constructing a custom model.

Thanks for the help—atacworks looks like a gamechanger!

Originally posted by @umasstr in #221 (comment)

avantikalal · 2021-04-05T16:05:50Z

Hi @umasstr , for the best results we recommend training your own model, if you have matched low/high quality data available. The noisy data and clean data could have any coverage, what is important is that the noisy data used for training should have similar coverage to the noisy data to which you intend to apply the model.

umasstr · 2021-04-05T16:52:09Z

Thanks for following up. In agreement with your above post, I found the pretrained model to be incompatible with my dataset and some sample ENCODE datasets. The custom trained model worked well, but there's a constant, low-level background in the output (green below).

Purple: original (macs, as directed in #221 )
Green: custom model
Orange: atac_bulk_lowqual_20m_20m

For reasons unknown, it looks like your pipeline needs a non-zero integer at all coordinates. >25M peaks have a 1.0 signal, comprising the majority of the bedgraph.

$ cut -f4 2296_infer.track.bedGraph | sort | uniq -c | head -1

26356100 1.0

Unfortunately, simply subtracting 1 from all coordinates doesn't produce an ideal track, though it could be useful for analytical purposes. Any idea how I can get rid of this artifact produced by custom models? I can file a new issue if you'd like.

avantikalal · 2021-04-05T16:58:39Z

Great to see the custom model working well overall, though I agree the low-level background is a problem. The model shouldn't require a nonzero output at every coordinate. Could you share the config file generated by your model training run?
Also, what is the downstream task - are you interested in using primarily the denoised track, the peak calls, or both?

umasstr · 2021-04-05T17:22:29Z

Indeed, the output is pretty impressive, especially if I set the track bounds to hide the 1.0 background:

Here is a link to the training config

avantikalal · 2021-04-06T04:54:05Z

One thing that might help is to train your model using the Poisson loss function for regression instead of the MSE/Pearson loss functions. This has given us better regression results and will become the default regression loss in the future.

To train a model using Poisson loss, you can set --mse_loss equal to 0, --pearson_loss equal to 0, and --poisson_loss equal to 1.

If you do try this, please let us know if it gives better results!

umasstr · 2021-05-12T03:17:27Z

Hi @avantikalal, I tried the parameters above, but the results did not look good.

I am a little concerned about the fact that, for each noisy BW, I have to retrain a model. None of the prebuilt models work with my data and none of my samples' models work on the others.

avantikalal mentioned this issue Apr 5, 2021

AtacWorks errors on custom data #221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrained model for bulk low-quality data #236

Pretrained model for bulk low-quality data #236

avantikalal commented Apr 5, 2021

avantikalal commented Apr 5, 2021 •

edited

Loading

umasstr commented Apr 5, 2021 •

edited

Loading

avantikalal commented Apr 5, 2021 •

edited

Loading

umasstr commented Apr 5, 2021

avantikalal commented Apr 6, 2021

umasstr commented May 12, 2021

Pretrained model for bulk low-quality data #236

Pretrained model for bulk low-quality data #236

Comments

avantikalal commented Apr 5, 2021

avantikalal commented Apr 5, 2021 • edited Loading

umasstr commented Apr 5, 2021 • edited Loading

avantikalal commented Apr 5, 2021 • edited Loading

umasstr commented Apr 5, 2021

avantikalal commented Apr 6, 2021

umasstr commented May 12, 2021

avantikalal commented Apr 5, 2021 •

edited

Loading

umasstr commented Apr 5, 2021 •

edited

Loading

avantikalal commented Apr 5, 2021 •

edited

Loading