AtacWorks errors on custom data #221

asundaresan1 · 2020-10-15T19:53:49Z

Hello, I am trying to analyze my bulk ATAC samples but get the attached errors. These are ATAC from human donor tissues, so I am trying to train the model which is where I get the error. I am using Atacworks version 0.3.0. I tested with the test data (available as part of tutorial) and it worked fine without any errors.

The ATAC-seq reads are aligned to the human reference genome (hg19) using BWA. For unique alignments, duplicate reads were filtered out. The resulting uniquely mapped reads were normalized to the same read depth across all samples and converted into bigWig files using BEDTools. (genomecov -bga)

AtacWorks_train.err.txt
ATACWorks_train.out.txt

ntadimeti · 2020-10-21T14:27:30Z

@asundaresan1 can you share your atacworks command please. I'm taking a look at your logs, will report back soon.

asundaresan1 · 2020-10-21T15:05:43Z

atacworks train
--noisybw /path_to_noisybw
--cleanbw /path_to_cleanbw
--cleanpeakfile /path_to_cleanpeakfile
--genome /path_to_hg19.auto.sizes
--val_chrom chr2
--holdout_chrom chr10
--out_home "path_to_output_train"
--exp_name "atacworks_train"
--distributed

ntadimeti · 2020-10-21T15:20:02Z

It seems like this error is occuring because of y_true containing only one class going by this message in the log Only one class present in y_true. It might be possible that your data is unbalanced and all the peaks are being assigned to a single class. Can you verify this on your end ?

asundaresan1 · 2020-10-21T15:29:45Z

How should I verify this? I also tested this on a mouse cell line ATAC and get same error.

ntadimeti · 2020-10-21T15:42:30Z

The atacworks output directory should contain a <prefix>_train.h5 and a <prefix>_val.h5 files. You can try loading them and look at the number of 0s and 1s in the last channel of the numpy arrays in the h5 file.

If the files are small enough, please upload them so I can try it as well.

Could you also share how you generated the peak file you passed in with --cleanpeakfile flag ?

asundaresan1 · 2020-10-21T16:20:37Z

The files are big to be uploaded here.
macs2 callpeak with options -f BAMPE and --keep-dup all.

avantikalal · 2020-11-02T18:49:51Z

Hi @asundaresan1, wondering if you were able to resolve this issue?
If not, you can go to the bw2h5 folder produced by AtacWorks and look at the train.h5 and val.h5 files in the following way:

import h5py
import numpy as np
f=h5py.File('val.h5')  <--- (use the appropriate file name here)
np.unique(f['label_cla'], return_counts=True)

Please let us know the output of this command for your train.h5 and val.h5 files.

asundaresan1 · 2020-11-03T14:35:51Z

Hi @avantikalal and @ntadimeti, I was able to train the model only after I preprocessed my samples using https://github.com/zchiang/atacworks_analysis/tree/master/preprocessing.

However, after denoising I don’t see much improvement in the quality of my data. I have attached the profiles of all the samples before and after denoising. I can see that the scale has improved, and the profiles are smoother, but I was expecting Sample_3 to profile like Sample_4 for example. Is there anything else that I can try?
TSS_profile_denoised.pdf
TSS_profile_prior_to_denoising.pdf

These are denoised using https://ngc.nvidia.com/catalog/models/nvidia:atac_bulk_lowcov_1m_50m

avantikalal · 2020-11-03T17:01:20Z

Hi @asundaresan1, the model you've used is trained to make low-coverage data look like higher coverage data, so it will enhance the profile at a local scale, but it does not improve aggregate TSS score across the whole genome.
We do have one model which is trained to improve TSS score: https://ngc.nvidia.com/catalog/models/nvidia:atac_bulk_lowqual_20m_20m . You could try this and see if it meets your purpose.
What is the aim of your experiment, and what is the sequencing depth of your data?

asundaresan1 · 2020-11-03T17:56:34Z

Thank you for your suggestion. Let me try with that model.
Is there a way to get denoised bam along with bigwigs and peaks?
The sequencing depth for all the samples is between 42M - 143M
We have performed human donor tissue ATAC on two different conditions and want to compare them. The profiles I sent are from one condition. The profiles of other condition look even worse. There is a lot of noise and peak calls are not good. If we can get the denoised data it should be like comparative analysis across these conditions (compare open chromatin regions,TF footprinting, etc)

avantikalal · 2020-11-03T21:11:14Z

Thanks for sharing the details @asundaresan1. Based on this, I think the model I suggested above is the best suited for your data. If that too doesn't work, we can discuss whether it may be possible for you to train your own model.
Unfortunately AtacWorks cannot produce a BAM file - only bigwig/bedGraph and peak calls.

avantikalal · 2020-11-17T17:43:03Z

Hi @asundaresan1 , just checking in on this issue. Were you able to get better results using the second model?

asundaresan1 · 2020-11-18T18:56:47Z

Hi @avantikalal , yes I got better results with the model you suggested.

umasstr · 2021-04-02T22:01:18Z

Hi @avantikalal, in cases where noisy samples have coverage >= that of the clean samples, should users always forego training and use your pretrained model, nvidia:atac_bulk_lowqual_20m_20m?

For example:

If the pretrained model is not successful in reducing noise, are there training parameters that should be considered when constructing a custom model.

Thanks for the help—atacworks looks like a gamechanger!

avantikalal · 2021-04-05T16:01:35Z

Hi @umasstr, I've opened a new issue for this discussion: #236

ntadimeti assigned avantikalal and ntadimeti Oct 21, 2020

asundaresan1 closed this as completed Nov 18, 2020

avantikalal mentioned this issue Apr 5, 2021

Pretrained model for bulk low-quality data #236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AtacWorks errors on custom data #221

AtacWorks errors on custom data #221

asundaresan1 commented Oct 15, 2020

ntadimeti commented Oct 21, 2020

asundaresan1 commented Oct 21, 2020

ntadimeti commented Oct 21, 2020 •

edited

Loading

asundaresan1 commented Oct 21, 2020 •

edited

Loading

ntadimeti commented Oct 21, 2020

asundaresan1 commented Oct 21, 2020

avantikalal commented Nov 2, 2020

asundaresan1 commented Nov 3, 2020

avantikalal commented Nov 3, 2020

asundaresan1 commented Nov 3, 2020

avantikalal commented Nov 3, 2020

avantikalal commented Nov 17, 2020

asundaresan1 commented Nov 18, 2020

umasstr commented Apr 2, 2021

avantikalal commented Apr 5, 2021

AtacWorks errors on custom data #221

AtacWorks errors on custom data #221

Comments

asundaresan1 commented Oct 15, 2020

ntadimeti commented Oct 21, 2020

asundaresan1 commented Oct 21, 2020

ntadimeti commented Oct 21, 2020 • edited Loading

asundaresan1 commented Oct 21, 2020 • edited Loading

ntadimeti commented Oct 21, 2020

asundaresan1 commented Oct 21, 2020

avantikalal commented Nov 2, 2020

asundaresan1 commented Nov 3, 2020

avantikalal commented Nov 3, 2020

asundaresan1 commented Nov 3, 2020

avantikalal commented Nov 3, 2020

avantikalal commented Nov 17, 2020

asundaresan1 commented Nov 18, 2020

umasstr commented Apr 2, 2021

avantikalal commented Apr 5, 2021

ntadimeti commented Oct 21, 2020 •

edited

Loading

asundaresan1 commented Oct 21, 2020 •

edited

Loading