Error with function load_rhb_at_positions_no_NAs #93

jamonterotena · 2024-01-22T10:34:42Z

I got the following error when executing STITCH with reference files:

Error in which((colClasses == "integer") & (h_row_1 != "-")) :
  dims [product 200] do not match the length of object [832]
Calls: STITCH ... extract_validate_and_load_haplotypes -> load_rhb_at_positions_no_NAs -> which
In addition: Warning message:
In (colClasses == "integer") & (h_row_1 != "-") :
  longer object length is not a multiple of shorter object length
Execution halted

It comes from the following function of the script reference-binary.R:

    ## first initialization
    ##
    ## want non-NULL values
    ## determine columns to get
    h_row_1 <- read.table(reference_haplotype_file, nrow = 1)
    if (!is.na(colClasses[1])) {
        haps_to_get <- which((colClasses == "integer") & (h_row_1 != "-")) - 1
    } else {
        haps_to_get <- which(h_row_1 != "-") - 1
    }

The first row of the reference hap file looks like this:

0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 1* 1* 0* 0* 1* 1* 0* 0* 1* 1* 0* 0* 0* 0* 1* 1* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 1* 1* 0* 0* 1* 1* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 1* 1* 0* 0* 1* 1* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 1* 0* 0* 0* 0* 1* 1* 0* 0* 1* 1* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 1* 1* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0* 0*

Notice there are 200 haplotypes for 100 founders. I suppose STITCH is taking only the 16x2 haplotypes from each corresponding specified 16way population.

What could be causing the issue? I find strange that the colClasses vector has 832 elements.

Jose

The text was updated successfully, but these errors were encountered:

jamonterotena · 2024-01-26T17:53:50Z

Hi again,

The problem is that the founders can belong to different 16way populations, so they were duplicated in the ID column of the .SAMPLES file, that contained 416 rows for the founders instead of 100. I wanted to ask you if it's possible to indicate that a founder ID can belong to two or more populations. Otherwise I will just split the .SAMPLES by families.

Best,
Jose

…ile #93

rwdavies · 2024-01-26T20:08:49Z

Thanks for the message, and identifying the problem, I've added a check to the code that should throw an error for that in the future

The only use of the reference sample file (alongside the reference haplotypes and legend file) is to exclude certain populations. Let's say you had prepared a file using all 1000 Genomes samples (from multiple populations), and in one run you wanted to do only Europeans, and another only East Asians, or something like that

Note STITCH can use reference haplotypes in one of three ways
Way 1 - if K matches the number of provided reference haplotypes, it will use them directly
Way 2 - if K is > number of reference haplotypes, it will fill the other haplotypes (ancestral ones used in STITCH) with noise
Way 3 - if K < number of reference haplotypes, it will initialize with some of the reference haplotypes (K of them), and then run a haploid version of EM to refine that estimate, using all samples

rwdavies · 2024-01-26T20:09:28Z

PS there are no actual "*" in the file right? The haps file? It should be only 0 and 1 (and spaces and newlines / enters)

jamonterotena · 2024-01-29T13:08:57Z

bcftools convert -h -Oz -o reference_${chr} -S samples fbjcvars_${chr}.vcf.gz #Convert vcf with all founders
Produced those haplotype files. I read it means that the allele is unphased. I don't know how to turn that off, so I will just remove the asterisks from the hap files with a script.

rwdavies added a commit that referenced this issue Jan 26, 2024

Add test to give error when ref sample file size does not match hap f…

6e87668

…ile #93

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with function load_rhb_at_positions_no_NAs #93

Error with function load_rhb_at_positions_no_NAs #93

jamonterotena commented Jan 22, 2024 •

edited

Loading

jamonterotena commented Jan 26, 2024

rwdavies commented Jan 26, 2024

rwdavies commented Jan 26, 2024

jamonterotena commented Jan 29, 2024

Error with function load_rhb_at_positions_no_NAs #93

Error with function load_rhb_at_positions_no_NAs #93

Comments

jamonterotena commented Jan 22, 2024 • edited Loading

jamonterotena commented Jan 26, 2024

rwdavies commented Jan 26, 2024

rwdavies commented Jan 26, 2024

jamonterotena commented Jan 29, 2024

jamonterotena commented Jan 22, 2024 •

edited

Loading