Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Writing data manually to a var SOMA DataFrame causes TileDB internal error #3535

Open
mdylan2 opened this issue Jan 8, 2025 · 3 comments
Assignees

Comments

@mdylan2
Copy link

mdylan2 commented Jan 8, 2025

Describe the bug
Trying to write some data to a var dataframe but I hit the following error:

TileDB internal: ['UnorderedWriter::dowork]  (FragmentMetadata: Cells are written outside of the defined current domain.)

What am I doing wrong?

To Reproduce

import pandas as pd
import pyarrow as pa
import tiledbsoma as soma
import os
import shutil

EXP_URI = "test"
try:
    # Step 1: Create a simple dataframe with three columns
    data = {
        'gene': ['GeneA', 'GeneB', 'GeneC', 'GeneD', 'GeneE'],
        'ens': ['ENSG0001', 'ENSG0002', 'ENSG0003', 'ENSG0004', 'ENSG0005'],
        'soma_joinid': [1, 2, 3, 4, 5]
    }

    df = pd.DataFrame(data)

    # Step 2: Initialize the soma.Experiment and collections
    exp = soma.Experiment.create(EXP_URI)
    ms = exp.add_new_collection('ms')
    rna = ms.add_new_collection('RNA', soma.Measurement)

    # Step 3: Define the schema
    schema = {
        "soma_joinid": pa.int64(),
        "gene": pa.large_string(),
        "ens": pa.large_string()
    }

    # Step 5: Define the schema for the dataframe
    var_schema = pa.schema(list(schema.items()))

    # Step 6: Create the dataframe in the rna collection
    var = rna.add_new_dataframe("var", schema=var_schema, index_column_names=["soma_joinid"])

    # Step 7: Write the pandas dataframe to the soma dataframe
    var.write(pa.Table.from_pandas(df))
    print("SUCCESS!")
    
except Exception as e:
    rna.close()
    ms.close()
    exp.close()
    print("--> EXCEPTION")
    print("Exception hit:", e)
    if os.path.exists(EXP_URI):
        shutil.rmtree(EXP_URI)

Versions (please complete the following information):

  • TileDB-SOMA version: 1.15.2
  • Tiledb version: 0.33.2
  • Language and language version (e.g. Python 3.9, R 4.3.2): 3.12.3
  • OS (e.g. MacOS, Ubuntu Linux): Ubuntu
  • Note: you can use tiledbsoma.show_package_versions() (Python) or tiledbsoma::show_package_versions() (R)

Additional context
Add any other context about the problem here.

@johnkerl
Copy link
Member

johnkerl commented Jan 8, 2025

@johnkerl
Copy link
Member

johnkerl commented Jan 8, 2025

Specficially, you need domain here:

    var = rna.add_new_dataframe("var", schema=var_schema, index_column_names=["soma_joinid"])

@johnkerl
Copy link
Member

johnkerl commented Jan 8, 2025

Since soma_joinid nominally starts at 0 (although this isn't required) you could do either

    var = rna.add_new_dataframe("var", schema=var_schema, domain=[[1,5]], index_column_names=["soma_joinid"])

or

    var = rna.add_new_dataframe("var", schema=var_schema, domain=[[0,5]], index_column_names=["soma_joinid"])

@johnkerl johnkerl self-assigned this Jan 8, 2025
@johnkerl johnkerl changed the title [Bug] Writing data manually to a var Soma DataFrame causes TileDB internal eror [docs] Writing data manually to a var Soma DataFrame causes TileDB internal error Jan 8, 2025
@johnkerl johnkerl changed the title [docs] Writing data manually to a var Soma DataFrame causes TileDB internal error [docs] Writing data manually to a var SOMA DataFrame causes TileDB internal error Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants