Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update harmony to new implementation #308

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

update harmony to new implementation #308

wants to merge 16 commits into from

Conversation

Intron7
Copy link
Member

@Intron7 Intron7 commented Dec 11, 2024

Fixes #299

@Intron7 Intron7 requested review from ilan-gold and removed request for ilan-gold December 11, 2024 19:23
@Intron7 Intron7 marked this pull request as draft December 11, 2024 19:23
@Intron7 Intron7 added the run-gpu-ci runs GPU CI label Dec 16, 2024
@github-actions github-actions bot removed the run-gpu-ci runs GPU CI label Dec 16, 2024
@Intron7 Intron7 requested a review from flying-sheep December 16, 2024 09:55
@Intron7 Intron7 added the run-gpu-ci runs GPU CI label Dec 16, 2024
@github-actions github-actions bot removed the run-gpu-ci runs GPU CI label Dec 16, 2024
@Intron7 Intron7 added the run-gpu-ci runs GPU CI label Dec 16, 2024
@github-actions github-actions bot removed the run-gpu-ci runs GPU CI label Dec 16, 2024
@Intron7 Intron7 marked this pull request as ready for review December 16, 2024 10:35
@Intron7 Intron7 added the run-gpu-ci runs GPU CI label Dec 16, 2024
@github-actions github-actions bot removed the run-gpu-ci runs GPU CI label Dec 16, 2024
@flying-sheep
Copy link
Member

I’ll check this out tomorrow, it’s too big to start now!

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@Intron7 Intron7 added the run-gpu-ci runs GPU CI label Dec 16, 2024
@github-actions github-actions bot removed the run-gpu-ci runs GPU CI label Dec 16, 2024
Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The harmonize function has a really nice layout. it’s easy to follow what it does, nice!

However it seems like you’re establishing a bunch of parameters that get reused and never changed after the initialization. You could e.g.

  • create it a frozen dataclass with methods so you can use the parameters using self.<name>, or
  • create a NamedTuple that you can pass around containing the parameters

pyproject.toml Outdated
Comment on lines 87 to 90
"src/rapids_singlecell/preprocessing/_harmonypy_gpu.py" = ["PLR0917"]
"src/rapids_singlecell/decoupler_gpu/_method_mlm.py" = ["PLR0917"]
"src/rapids_singlecell/decoupler_gpu/_method_wsum.py" = ["PLR0917"]
"src/rapids_singlecell/preprocessing/_harmony/__init__.py" = ["PLR0917"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should ignore these inline (#noqa: PLR0917) instead of per file.

Also only if absolute necessary, I think it’s one of the best rules there is. I understand that it numba doesn‘t respect *, but that’s why it should be done inline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • don’t name variables LIKE_CONSTANTS
  • don’t name variables with single letter names (except for a, b for binary operators, i for enumerate and similar conventions)

Comment on lines 21 to 24
X (cp.ndarray): Input 2D array.

Returns:
cp.ndarray: Row-normalized 2D array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to duplicate the types here. (also applies to other places where you might have done that)

        X: Input 2D array.

    Returns:
        Row-normalized 2D array.

int tid = threadIdx.x; // Thread index within the block

// Ensure we're within matrix bounds
if (row >= rows) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no error? so is that a convolution that’s expected to run with invalid arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes kinda. That come from blocks that are overlapping. eg. 32 but only 29 cells.

return X


def _normalize_cp(X: cp.ndarray, p: int = 2) -> cp.ndarray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why name it “p”?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats the name of the variable in torch

Comment on lines 194 to 208
_clustering(
Z_norm,
Pr_b,
Phi,
R,
E,
O,
n_clusters,
theta,
tol_clustering,
objectives_harmony,
max_iter_clustering,
sigma,
block_proportion,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is the reason why PLR0917 exists. Don’t suppress it, specify everything by name instead.

Z_hat = _correction(Z, R, Phi, O, ridge_lambda, correction_method)
Z_norm = _normalize_cp(Z_hat, p=2)
if verbose:
print(f"\tCompleted {i + 1} / {max_iter_harmony} iteration(s).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t you have some logging infra you should use instead?

I don’t want to see a single print statement in any library I use (if it has a CLI, that one may have print statements)

src/rapids_singlecell/preprocessing/_harmony/__init__.py Outdated Show resolved Hide resolved
if verbose:
print(f"\tCompleted {i + 1} / {max_iter_harmony} iteration(s).")

if _is_convergent_harmony(objectives_harmony, tol=tol_harmony):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this one is an out parameter of _clustering? what else is being modified? You should make that clear

Intron7 and others added 2 commits December 17, 2024 14:58

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Philipp A. <flying-sheep@web.de>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Philipp A. <flying-sheep@web.de>
@Intron7
Copy link
Member Author

Intron7 commented Dec 17, 2024

removed all prints and added a warning if harmony didnt converge

@Intron7 Intron7 added the run-gpu-ci runs GPU CI label Dec 17, 2024
@github-actions github-actions bot removed the run-gpu-ci runs GPU CI label Dec 17, 2024
Intron7 and others added 2 commits December 17, 2024 16:41

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@Intron7 Intron7 added the run-gpu-ci runs GPU CI label Dec 17, 2024
@github-actions github-actions bot removed the run-gpu-ci runs GPU CI label Dec 17, 2024
@Intron7 Intron7 added the run-gpu-ci runs GPU CI label Dec 17, 2024
@github-actions github-actions bot removed the run-gpu-ci runs GPU CI label Dec 17, 2024
@Intron7 Intron7 requested a review from flying-sheep December 17, 2024 15:55
@Intron7 Intron7 added the run-gpu-ci runs GPU CI label Dec 17, 2024
@Intron7 Intron7 enabled auto-merge (squash) December 17, 2024 15:55
@github-actions github-actions bot removed the run-gpu-ci runs GPU CI label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch harmony to harmony-torch
2 participants