-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretrained model for bulk low-quality data #236
Comments
Hi @umasstr , for the best results we recommend training your own model, if you have matched low/high quality data available. The noisy data and clean data could have any coverage, what is important is that the noisy data used for training should have similar coverage to the noisy data to which you intend to apply the model. |
Thanks for following up. In agreement with your above post, I found the pretrained model to be incompatible with my dataset and some sample ENCODE datasets. The custom trained model worked well, but there's a constant, low-level background in the output (green below). Purple: original (macs, as directed in #221 ) For reasons unknown, it looks like your pipeline needs a non-zero integer at all coordinates. >25M peaks have a 1.0 signal, comprising the majority of the bedgraph.
Unfortunately, simply subtracting 1 from all coordinates doesn't produce an ideal track, though it could be useful for analytical purposes. Any idea how I can get rid of this artifact produced by custom models? I can file a new issue if you'd like. |
Great to see the custom model working well overall, though I agree the low-level background is a problem. The model shouldn't require a nonzero output at every coordinate. Could you share the config file generated by your model training run? |
Indeed, the output is pretty impressive, especially if I set the track bounds to hide the 1.0 background: Here is a link to the training config |
One thing that might help is to train your model using the Poisson loss function for regression instead of the MSE/Pearson loss functions. This has given us better regression results and will become the default regression loss in the future. To train a model using Poisson loss, you can set If you do try this, please let us know if it gives better results! |
Hi @avantikalal, I tried the parameters above, but the results did not look good. I am a little concerned about the fact that, for each noisy BW, I have to retrain a model. None of the prebuilt models work with my data and none of my samples' models work on the others. |
Hi @avantikalal, in cases where noisy samples have coverage >= that of the clean samples, should users always forego training and use your pretrained model, nvidia:atac_bulk_lowqual_20m_20m?
For example:
If the pretrained model is not successful in reducing noise, are there training parameters that should be considered when constructing a custom model.
Thanks for the help—atacworks looks like a gamechanger!
Originally posted by @umasstr in #221 (comment)
The text was updated successfully, but these errors were encountered: