Skip to content

Commit

Permalink
Return smaller allele clustering labels (labels_previous) when the …
Browse files Browse the repository at this point in the history
…adjusted Rand index is sufficiently high to reduce predicted allele numbers.
  • Loading branch information
akikuno committed Apr 23, 2024
1 parent d2309a1 commit 8872daa
Showing 1 changed file with 5 additions and 6 deletions.
11 changes: 5 additions & 6 deletions src/DAJIN2/core/clustering/clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,17 +39,16 @@ def optimize_labels(X: spmatrix, coverage_sample: int, coverage_control: int) ->
# print(i, Counter(labels_sample), Counter(labels_control), Counter(labels_current)) # ! DEBUG

num_labels_control = count_number_of_clusters(labels_control, coverage_control)
mutual_info = metrics.adjusted_rand_score(labels_previous, labels_current)
rand_index = metrics.adjusted_rand_score(labels_previous, labels_current)

"""
Return the number of clusters when:
- the number of clusters in control is split into more than one.
- the mutual information between the current and previous labels is high enough (= similar).
- the number of clusters in control is split into more than one.
- the mutual information between the current and previous labels is high enough (= similar).
To reduce the allele number, previous labels are returned.
"""
if num_labels_control >= 2:
if num_labels_control >= 2 or rand_index >= 0.95:
return labels_previous
if 0.95 <= mutual_info <= 1.0:
return labels_current
labels_previous = labels_current
return labels_previous

Expand Down

0 comments on commit 8872daa

Please sign in to comment.