Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reversible typecasts (parent task) #106

Closed
johnkerl opened this issue May 20, 2022 · 1 comment
Closed

Reversible typecasts (parent task) #106

johnkerl opened this issue May 20, 2022 · 1 comment

Comments

@johnkerl
Copy link
Member

johnkerl commented May 20, 2022

When we ingest data into TileDB from, say, AnnData, we need to remap a few things to be storable:

  • Categorical strings to strings
  • bool to uint8
  • Unicode array dimensions to ASCII dimensions -- necessary for TileDB at present
  • Unicode array attributes to ASCII array attributes -- TileDB allows attributes to be Unicode, but, they are not queryable via the QueryCondition API. See also Queryability of dataframe attribute columns #99.

Source links:

Related: #105

Item 2 will be addressed in TileDB Core 2.10, where we'll have native boolean support. The remaining ones, though, are ongoing concerns.

Proposal for item 1:

  • Use TileDB group/array metadata tags
  • Indicate the following at the SOMA-group level:
    • Source provenance (e.g. anndata, seurat)
    • Language and package version of the code that wrote the data
  • Indicate the following at the SOMA-array level:
    • Either a dict from all columns to their original types, or, just the 'exceptions' (e.g. categorical-string columns)

Proposal for items 3 and 4:

@johnkerl
Copy link
Member Author

johnkerl commented Feb 3, 2023

Split out remaining issues to #866 and #867. cc @maniarathi.

@johnkerl johnkerl closed this as completed Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant