Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Port re-indexer to the R API #2637

Merged

Conversation

mojaveazure
Copy link
Member

@mojaveazure mojaveazure commented May 23, 2024

Expose the re-indexer to the R API through a new IntIndexer class, this class takes a vector of integerish values to create the re-indexer, and has one method $get_indexer() that takes a vector of lookups to re-index

New SOMA classes:

  • IntIndexer: an R6 wrapper around the C++ re-indexer
    • $initialize(): takes an integerish vector to intitialize the C++ re-indexer
    • $get_indexer(): takes an integerish vector or Arrow Array to perform a lookup
      • always returns a vector of bit64::integer64
      • optionally takes a paramter nomatch_na to adjust the nomatch value to bit64::NA_integer64_

resolves #1610

Copy link
Contributor

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@eddelbuettel
Copy link
Contributor

Better still after follow-up off-GH discussion. Should be read AFAICT.

Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mojaveazure !!

@mojaveazure mojaveazure merged commit d97915f into main May 24, 2024
4 checks passed
@mojaveazure mojaveazure deleted the paulhoffman/sc-40890/r-c-add-r-support-for-c-reindexer branch May 24, 2024 15:22
Copy link

The backport to release-1.11 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-1.11 release-1.11
# Navigate to the new working tree
cd .worktrees/backport-release-1.11
# Create a new branch
git switch --create backport-2637-to-release-1.11
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 d97915ff69a0f4225c5b5218f799b91a725e0da8
# Push it to GitHub
git push --set-upstream origin backport-2637-to-release-1.11
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-1.11

Then, create a pull request where the base branch is release-1.11 and the compare/head branch is backport-2637-to-release-1.11.

mojaveazure added a commit that referenced this pull request Jun 14, 2024
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`:
 - `TRUE`: disable re-indexing on all axes
 - `FALSE: re-index on all axes
 - `NA`: re-index only on major axis, disable re-indexing on all axes (default)

`BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python)

`BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances:
 - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"`
 - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"`

`repr` of `"T"` is allowed in all circumstances and continues to be the default

Two new fields are available to blockwise iterators:
 - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed
 - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed

resolves #2671
johnkerl pushed a commit that referenced this pull request Jun 17, 2024
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`:
 - `TRUE`: disable re-indexing on all axes
 - `FALSE: re-index on all axes
 - `NA`: re-index only on major axis, disable re-indexing on all axes (default)

`BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python)

`BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances:
 - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"`
 - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"`

`repr` of `"T"` is allowed in all circumstances and continues to be the default

Two new fields are available to blockwise iterators:
 - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed
 - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed

resolves #2671
github-actions bot pushed a commit that referenced this pull request Jun 17, 2024
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`:
 - `TRUE`: disable re-indexing on all axes
 - `FALSE: re-index on all axes
 - `NA`: re-index only on major axis, disable re-indexing on all axes (default)

`BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python)

`BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances:
 - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"`
 - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"`

`repr` of `"T"` is allowed in all circumstances and continues to be the default

Two new fields are available to blockwise iterators:
 - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed
 - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed

resolves #2671
johnkerl pushed a commit that referenced this pull request Jun 17, 2024
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`:
 - `TRUE`: disable re-indexing on all axes
 - `FALSE: re-index on all axes
 - `NA`: re-index only on major axis, disable re-indexing on all axes (default)

`BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python)

`BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances:
 - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"`
 - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"`

`repr` of `"T"` is allowed in all circumstances and continues to be the default

Two new fields are available to blockwise iterators:
 - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed
 - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed

resolves #2671

Co-authored-by: Paul Hoffman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[r] Add R support for re-indexer
3 participants