-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update harmony to new implementation #308
base: main
Are you sure you want to change the base?
Conversation
I’ll check this out tomorrow, it’s too big to start now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The harmonize
function has a really nice layout. it’s easy to follow what it does, nice!
However it seems like you’re establishing a bunch of parameters that get reused and never changed after the initialization. You could e.g.
- create it a frozen dataclass with methods so you can use the parameters using
self.<name>
, or - create a
NamedTuple
that you can pass around containing the parameters
pyproject.toml
Outdated
"src/rapids_singlecell/preprocessing/_harmonypy_gpu.py" = ["PLR0917"] | ||
"src/rapids_singlecell/decoupler_gpu/_method_mlm.py" = ["PLR0917"] | ||
"src/rapids_singlecell/decoupler_gpu/_method_wsum.py" = ["PLR0917"] | ||
"src/rapids_singlecell/preprocessing/_harmony/__init__.py" = ["PLR0917"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should ignore these inline (#noqa: PLR0917
) instead of per file.
Also only if absolute necessary, I think it’s one of the best rules there is. I understand that it numba doesn‘t respect *
, but that’s why it should be done inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- don’t name variables LIKE_CONSTANTS
- don’t name variables with single letter names (except for
a, b
for binary operators,i
forenumerate
and similar conventions)
X (cp.ndarray): Input 2D array. | ||
|
||
Returns: | ||
cp.ndarray: Row-normalized 2D array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to duplicate the types here. (also applies to other places where you might have done that)
X: Input 2D array.
Returns:
Row-normalized 2D array.
int tid = threadIdx.x; // Thread index within the block | ||
|
||
// Ensure we're within matrix bounds | ||
if (row >= rows) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no error? so is that a convolution that’s expected to run with invalid arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes kinda. That come from blocks that are overlapping. eg. 32 but only 29 cells.
return X | ||
|
||
|
||
def _normalize_cp(X: cp.ndarray, p: int = 2) -> cp.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why name it “p”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats the name of the variable in torch
_clustering( | ||
Z_norm, | ||
Pr_b, | ||
Phi, | ||
R, | ||
E, | ||
O, | ||
n_clusters, | ||
theta, | ||
tol_clustering, | ||
objectives_harmony, | ||
max_iter_clustering, | ||
sigma, | ||
block_proportion, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is the reason why PLR0917 exists. Don’t suppress it, specify everything by name instead.
Z_hat = _correction(Z, R, Phi, O, ridge_lambda, correction_method) | ||
Z_norm = _normalize_cp(Z_hat, p=2) | ||
if verbose: | ||
print(f"\tCompleted {i + 1} / {max_iter_harmony} iteration(s).") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don’t you have some logging infra you should use instead?
I don’t want to see a single print
statement in any library I use (if it has a CLI, that one may have print statements)
if verbose: | ||
print(f"\tCompleted {i + 1} / {max_iter_harmony} iteration(s).") | ||
|
||
if _is_convergent_harmony(objectives_harmony, tol=tol_harmony): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this one is an out parameter of _clustering
? what else is being modified? You should make that clear
Co-authored-by: Philipp A. <flying-sheep@web.de>
Co-authored-by: Philipp A. <flying-sheep@web.de>
removed all prints and added a warning if harmony didnt converge |
Fixes #299