Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AtacWorks errors on custom data #221

Closed
asundaresan1 opened this issue Oct 15, 2020 · 15 comments
Closed

AtacWorks errors on custom data #221

asundaresan1 opened this issue Oct 15, 2020 · 15 comments
Assignees

Comments

@asundaresan1
Copy link

Hello, I am trying to analyze my bulk ATAC samples but get the attached errors. These are ATAC from human donor tissues, so I am trying to train the model which is where I get the error. I am using Atacworks version 0.3.0. I tested with the test data (available as part of tutorial) and it worked fine without any errors.

The ATAC-seq reads are aligned to the human reference genome (hg19) using BWA. For unique alignments, duplicate reads were filtered out. The resulting uniquely mapped reads were normalized to the same read depth across all samples and converted into bigWig files using BEDTools. (genomecov -bga)

AtacWorks_train.err.txt
ATACWorks_train.out.txt

@ntadimeti
Copy link
Collaborator

@asundaresan1 can you share your atacworks command please. I'm taking a look at your logs, will report back soon.

@asundaresan1
Copy link
Author

atacworks train
--noisybw /path_to_noisybw
--cleanbw /path_to_cleanbw
--cleanpeakfile /path_to_cleanpeakfile
--genome /path_to_hg19.auto.sizes
--val_chrom chr2
--holdout_chrom chr10
--out_home "path_to_output_train"
--exp_name "atacworks_train"
--distributed

@ntadimeti
Copy link
Collaborator

ntadimeti commented Oct 21, 2020

It seems like this error is occuring because of y_true containing only one class going by this message in the log Only one class present in y_true. It might be possible that your data is unbalanced and all the peaks are being assigned to a single class. Can you verify this on your end ?

@asundaresan1
Copy link
Author

asundaresan1 commented Oct 21, 2020

How should I verify this? I also tested this on a mouse cell line ATAC and get same error.

@ntadimeti
Copy link
Collaborator

The atacworks output directory should contain a <prefix>_train.h5 and a <prefix>_val.h5 files. You can try loading them and look at the number of 0s and 1s in the last channel of the numpy arrays in the h5 file.

If the files are small enough, please upload them so I can try it as well.

Could you also share how you generated the peak file you passed in with --cleanpeakfile flag ?

@asundaresan1
Copy link
Author

The files are big to be uploaded here.
macs2 callpeak with options -f BAMPE and --keep-dup all.

@avantikalal
Copy link
Contributor

Hi @asundaresan1, wondering if you were able to resolve this issue?
If not, you can go to the bw2h5 folder produced by AtacWorks and look at the train.h5 and val.h5 files in the following way:

import h5py
import numpy as np
f=h5py.File('val.h5')  <--- (use the appropriate file name here)
np.unique(f['label_cla'], return_counts=True)

Please let us know the output of this command for your train.h5 and val.h5 files.

@asundaresan1
Copy link
Author

Hi @avantikalal and @ntadimeti, I was able to train the model only after I preprocessed my samples using https://github.com/zchiang/atacworks_analysis/tree/master/preprocessing.

However, after denoising I don’t see much improvement in the quality of my data. I have attached the profiles of all the samples before and after denoising. I can see that the scale has improved, and the profiles are smoother, but I was expecting Sample_3 to profile like Sample_4 for example. Is there anything else that I can try?
TSS_profile_denoised.pdf
TSS_profile_prior_to_denoising.pdf

These are denoised using https://ngc.nvidia.com/catalog/models/nvidia:atac_bulk_lowcov_1m_50m

@avantikalal
Copy link
Contributor

Hi @asundaresan1, the model you've used is trained to make low-coverage data look like higher coverage data, so it will enhance the profile at a local scale, but it does not improve aggregate TSS score across the whole genome.
We do have one model which is trained to improve TSS score: https://ngc.nvidia.com/catalog/models/nvidia:atac_bulk_lowqual_20m_20m . You could try this and see if it meets your purpose.
What is the aim of your experiment, and what is the sequencing depth of your data?

@asundaresan1
Copy link
Author

Thank you for your suggestion. Let me try with that model.
Is there a way to get denoised bam along with bigwigs and peaks?
The sequencing depth for all the samples is between 42M - 143M
We have performed human donor tissue ATAC on two different conditions and want to compare them. The profiles I sent are from one condition. The profiles of other condition look even worse. There is a lot of noise and peak calls are not good. If we can get the denoised data it should be like comparative analysis across these conditions (compare open chromatin regions,TF footprinting, etc)

@avantikalal
Copy link
Contributor

Thanks for sharing the details @asundaresan1. Based on this, I think the model I suggested above is the best suited for your data. If that too doesn't work, we can discuss whether it may be possible for you to train your own model.
Unfortunately AtacWorks cannot produce a BAM file - only bigwig/bedGraph and peak calls.

@avantikalal
Copy link
Contributor

Hi @asundaresan1 , just checking in on this issue. Were you able to get better results using the second model?

@asundaresan1
Copy link
Author

Hi @avantikalal , yes I got better results with the model you suggested.

@umasstr
Copy link

umasstr commented Apr 2, 2021

Hi @avantikalal, in cases where noisy samples have coverage >= that of the clean samples, should users always forego training and use your pretrained model, nvidia:atac_bulk_lowqual_20m_20m?

For example:
image

If the pretrained model is not successful in reducing noise, are there training parameters that should be considered when constructing a custom model.

Thanks for the help—atacworks looks like a gamechanger!

@avantikalal
Copy link
Contributor

Hi @umasstr, I've opened a new issue for this discussion: #236

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants