Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking changes for 0.4 - simplified IDs, support for external annotations and SCTransform update #20

Merged
merged 40 commits into from
Jun 20, 2024

Conversation

rasmushenningsson
Copy link
Collaborator

See CHANGELOG for details.

…D instead

NB: set_var_id_cols! and set_obs_id_cols! are currently removed and will be added again in some form.
Add kwargs `duplicate_var` and `duplicate_obs` to `DataMatrix` and `load10x`.
By default, `feature_type` is included as an additional var ID column if present in features
for all samples.
Otherwise `extra_var_id_cols` can be passed to `load_counts`/`merge_counts` to specify
columns manually.

`duplicate_var` and `duplicate_obs` are now properly passed to load/merge/update_matrix
functions.
…n data.obs

New functionality is not yet tested, but old functionality should work.
…projection thereof.

* Annotations now uses get/getindex for basic access
* project with NormalizationModel now gives better error message when external annotation is missing
* Unit tests
Also move external_obs unit tests to separate @testset.
* Use different IDs in counts_proj to ensure we don't accidently take info from original.
* Support DataFrames where Annotations are supported (for covariates)
* _get_df -> get_table
* Unit tests
* Remove Annotations from exports.
* Update some docstrings and comments.
Refactored to share some code with normalization.
* var_counts_fraction! now supports external_var
* bugfix when retrieving external column from Annotations
* var_counts_fraction! unit tests with external_var
idf was not properly subsetted when using var_filter in tf_idf_transform,
causing an error to be thrown
logtransform and tf_idf_transform now handles duplicate var IDs by optionally including
feature_type

Corresponding unit tests.
* SCTransformModel now stores var_match (needed for feature_mask)
* sctransform sets feature_mask correctly (affects logcellcounts)

Unit testing of sctransform:
* Variable subsetting using var_filter
* Handling of duplicate var IDs
Change regex to simpler startswith.
Add `var_counts_sum`, `var_counts_sum!` and `VarCountsSumModel` that are
used for creating annotations by summing over the (chosen) variables in
a DataMatrix.

In particular, it can be used to compute total RNA counts and total
number of RNA variables with nonzero expression.
Better syntax:
`filter_var(a.group=>==("A"), data)`
(where `a::Annotations`.)

Also ensures we don't pass around all annotations when only a single one is needed.
@rasmushenningsson rasmushenningsson merged commit e2e0658 into main Jun 20, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant