nb-cores tuning different total row numbers #17

sarapita · 2022-05-23T07:36:48Z

Dear John,

I’m using unitig-counter/1.1.0 and then feeding the output into pyseer for LMM-based association analyses. However,
I’ve noticed I get different number of significant unitigs when using different values for —nb-cores when running unitig-counter. I then noticed the number of lines in the output of the unitigs file differs in row number (e.g. when using —nb-cores 1 vs 4). Is this occurrence reproducible on your end? If so, is the issue due to parralelization? Is it safest to stick to —nb-cores 1?

Thanks in advance,
Sara

johnlees · 2022-05-23T08:58:54Z

It might be more informative to check exactly which sequences differ between the two rather than just the counts.

But I am not sure I'm afraid, but perhaps sticking to one core might be safest in that case.

sarapita · 2022-05-25T14:17:57Z

Thanks for your prompt reply!

So I looked a bit more into it and I realise what I posted before isn’t quite right; sorry about this. So I decided to give a bit more context.

I executed 4 runs of unitig-counter (I encounter the same issue when using unitig-caller instead) with —nb-cores 1,2,3 & 4, respectively.
I get the same number of rows in the unitigs output file (contrary to what I said in my first post) and same number of unique unitigs (see below),
but the unique unitigs are different across the 4 runs.

#total unique unitigs
for threads in {1..4}; do zcat count_unitigs_threads_${threads}/unitigs.txt.gz | awk '{print $1}' | sort | uniq | wc -l;done
353548
353548
353548
353548

For example, between the run with 1 thread and the run with 2 threads, 311607 unitigs were common in both runs
and 41941 reads were reported on the 1-thread run
that were not found in the 2-thread run and vice-versa.

anti_join(thread1,thread2) %>% dim
Joining, by = "unitigs"
[1] 41941 1

inner_join(thread1,thread2) %>% dim
Joining, by = "unitigs"
[1] 311607 1

I didn't go on to assess how "different" these unitigs were yet, as you alluded to in your previous post.

When inputting unitigs from these different runs into pyseer for a population structure
adjusted llm (by employing a similarity matrix), this translated into identical bonferroni
significance thresholds, but 4 different similarity matrices,
and number of significant hits, especially when going from a 1-thread run to a 2-threaded run (see table below).

wc -l count_unitigs_threads_*/significant_unitigs.txt
860 count_unitigs_threads_1/significant_unitigs.txt
53 count_unitigs_threads_2/significant_unitigs.txt
98 count_unitigs_threads_3/significant_unitigs.txt
103 count_unitigs_threads_4/significant_unitigs.txt
1114 total

This is a bit concerning given, for example, only 6 significant unitigs of the 52
found in a 2-thread run are successfully grepped from the significant unitigs in
the 1-thread run.

If you have any insight into why this might be happening, I'd appreciate it.
Initially, I thought this was a threading issue where not everything was being merged back
into the final table, but I wonder now if I might be missing something about the method
itself.

I initially ran unitig counting on 2 cores, so in the meantime will see if the annotation hits are similar if I switch to using 1-core.

ktmeaton · 2024-12-17T17:42:13Z

I also had this issue, where different -nb-cores resulted in the same number of unitigs, but different sequences. I found that the unitigs were actually the same across the different runs, it was just that in the some runs, the reverse complement was reported (even if the unitig was found in only one strain).

This is very anecdotal, but in testing, I noticed that an even number of cores tends to output reverse complements, (-nb-cores 2, nb-cores 4) while an odd number of cores keeps the original strand (-nb-cores 1, nb-cores-3). I'm not sure how reproducible that is at scale though.

I haven't noticed a difference yet in pyseer results, but will report back if i do!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nb-cores tuning different total row numbers #17

nb-cores tuning different total row numbers #17

sarapita commented May 23, 2022

johnlees commented May 23, 2022

sarapita commented May 25, 2022

ktmeaton commented Dec 17, 2024

nb-cores tuning different total row numbers #17

nb-cores tuning different total row numbers #17

Comments

sarapita commented May 23, 2022

johnlees commented May 23, 2022

sarapita commented May 25, 2022

ktmeaton commented Dec 17, 2024