Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reading in file #302

Open
Ge0rges opened this issue Oct 8, 2023 · 3 comments
Open

Error reading in file #302

Ge0rges opened this issue Oct 8, 2023 · 3 comments

Comments

@Ge0rges
Copy link

Ge0rges commented Oct 8, 2023

Hello,

I have written an R script that uses methylKit to ultimately get DMRs. In this script, I read in multiple .bed files into a single methRead() call. The script completes successfully on at least 3 different sets of data I have input.

I am encountering an odd issue however with one dataset where the following error occurs:

Received list of locations.
Reading file.
Reading file.
Erreur in `.rowNamesDF<-`(x, value = value) :
  length of 'row.names' incorrect
Calls : methRead ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
Execution stopped

As you can see, this happens when reading in file 2. However, these files are all generated in an identical fashion by a master script. Here is what their head looks like:

==> methylation/barcode01/1821_sub.bed <==                                                                                                        
contig_6061     3       4       m       1       +       3       4       255,0,0 1       0.00    0       1       0       0       0       0       2 
contig_6061     13      14      m       2       +       13      14      255,0,0 2       0.00    0       2       0       0       0       1       0 
contig_6061     44      45      m       3       +       44      45      255,0,0 3       0.00    0       3       0       0       0       0       0 
contig_6061     45      46      m       1       -       45      46      255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     52      53      m       1       -       52      53      255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     73      74      m       1       -       73      74      255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     139     140     m       1       +       139     140     255,0,0 1       0.00    0       1       0       0       0       0       3 
contig_6061     164     165     m       1       +       164     165     255,0,0 1       0.00    0       1       0       0       0       0       3 
contig_6061     189     190     m       1       +       189     190     255,0,0 1       0.00    0       1       0       0       0       0       4 
contig_6061     190     191     m       1       -       190     191     255,0,0 1       0.00    0       1       0       0       0       0       0 
                                                                                                                                                  
==> methylation/barcode02/1821_sub.bed <==                                                                                                        
contig_6061     3066    3067    m       1       -       3066    3067    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3149    3150    m       1       -       3149    3150    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3297    3298    m       1       -       3297    3298    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3327    3328    m       1       -       3327    3328    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3350    3351    m       1       -       3350    3351    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3383    3384    m       1       -       3383    3384    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3416    3417    m       1       -       3416    3417    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3455    3456    m       1       -       3455    3456    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3495    3496    m       1       -       3495    3496    255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     3517    3518    m       1       -       3517    3518    255,0,0 1       0.00    0       1       0       0       0       0       0 
                                                                                                                                                  
==> methylation/barcode03/1821_sub.bed <==                                                                                                        
contig_6061     3       4       m       1       +       3       4       255,0,0 1       0.00    0       1       0       0       0       0       0 
contig_6061     4       5       m       1       -       4       5       255,0,0 1       0.00    0       1       0       0       0       2       0 
contig_6061     14      15      m       1       -       14      15      255,0,0 1       0.00    0       1       0       0       0       0       2 
contig_6061     33      34      m       1       +       33      34      255,0,0 1       0.00    0       1       0       0       0       1       0 
contig_6061     44      45      m       2       +       44      45      255,0,0 2       0.00    0       2       0       0       0       0       0 
contig_6061     45      46      m       3       -       45      46      255,0,0 3       0.00    0       3       0       0       0       0       0 
contig_6061     79      80      m       1       -       79      80      255,0,0 1       0.00    0       1       0       0       0       2       0 
contig_6061     86      87      m       1       -       86      87      255,0,0 1       0.00    0       1       0       0       0       2       0 
contig_6061     119     120     m       1       -       119     120     255,0,0 1       0.00    0       1       0       1       0       1       0 
contig_6061     127     128     m       1       -       127     128     255,0,0 1       0.00    0       1       0       0       0       0       2 

Relevant script:

pipeline = list(fraction = FALSE, chr.col = 1, start.col = 2, end.col = 3, coverage.col = 10, strand.col = 6, freqC.col = 11)                           
                                                                                                                                                        
# Get list of bedMethyl files to analyze                                                                                                                
files.list = list()                                                                                                                                     
samples = list()                                                                                                                                        
treatment = vector()                                                                                                                                    
t = 0                                                                                                                                                   
                                                                                                                                                        
# For each barcode                                                                                                                                      
for (i in 1:7) {                                                                                                                                        
    # Skip the control barcode                                                                                                                          
    if (i == 4) next                                                                                                                                    
                                                                                                                                                        
    # Add that barcode to the sample list                                                                                                               
    sample = paste0("barcode0", i)                                                                                                                      
    samples <- c(samples, sample)                                                                                                                       
                                                                                                                                                        
    # Get file path to the bedmethyl file                                                                                                               
    files.list <- c(files.list, paste0("/researchdrive/gkanaan/seaice_methylation/methylation/", sample, "/", assembly, ".bed"))                        
                                                                                                                                                        
    # Add the correct treatment (top, middle, bottom)                                                                                                   
    treatment = c(treatment, t%%3)                                                                                                                      
    t = t + 1                                                                                                                                           
}                                                                                                                                                       
                                                                                                                                                        
meth_obj = methRead(files.list, sample.id = samples, assembly=assembly, treatment=treatment, context="CHH", pipeline=pipeline, header=FALSE, mincov=5)

I appreciate any tips.

@Ge0rges Ge0rges changed the title Error reading in multiple files together Error reading in file Oct 8, 2023
@Ge0rges
Copy link
Author

Ge0rges commented Oct 8, 2023

Ok it seems that this is due to the fact that 0 rows are left after coverage filtering. MethylKit unfortunately does not handle this very gracefully. @alexg9010 is there a way to skip files with 0 rows after coverage filtering instead of crashing?

Browse[5]> n                                                                          
debug: data[, coverage.col] = round(data[, coverage.col])                             
Browse[5]> n                                                                          
debug: data = data[data[, coverage.col] >= mincov, ]                                  
Browse[5]> n                                                                          
debug: strand = rep("+", nrow(data))                                                  
Browse[5]> print(data)                                                                
 [1] V1  V2  V3  V4  V5  V6  V7  V8  V9  V10 V11 V12 V13 V14 V15 V16 V17 V18          
<0 lignes> (ou 'row.names' de longueur nulle)                                         

@alexg9010
Copy link
Collaborator

Hi @Ge0rges ,

You are right, this edge case is not handled very well. For now, you could use tryCatch to filter files with zero rows. using this code snippet:

meth_list <- lapply(seq_along(file.list),
                    function(i) {
                      tryCatch(
                        expr = {
                          methRead(
                            location = file.list[[i]],
                            sample.id = as.list(samples)[[i]],
                            assembly=assembly, 
                            # treatment=treatment, 
                            context="CHH", 
                            pipeline=pipeline, 
                            header=FALSE, 
                            mincov=5)
                          )
                        },
                        error = function(e) {
                          message(paste0("file ", file.list[[i]], " failed to load."))
                          return(NULL)
                        }
                      )
                      
                    })

failed_samples <- sapply(meth_list, is.null)
meth_obj <- methylRawList(meth_list[!failed_samples], treatment = treatment[!failed_samples])

Best
Alex

@Ge0rges
Copy link
Author

Ge0rges commented Oct 9, 2023

Thanks for that solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants