Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XClone BAF xclone.pp.xclonedata > cell_anno.index.name = None #24

Open
stublakemore opened this issue Dec 4, 2024 · 4 comments
Open

Comments

@stublakemore
Copy link

Dear Xianjie,

After spending a lot of time trying to resolve this issue myself, I wonder whether you can help me (again)!

Using Python3.9 and XClone v0.3.8 as per my other now resolved issues, when attempting to create the BAF_adata object using the xclone.pp.xclonedata function following the standard API documentation (https://xclone-cnv.readthedocs.io/en/latest/API.html#baf-module), I get the following error when running exactly this code:

BAF_adata = xclone.pp.xclonedata([AD_file, DP_file], 'BAF', mtx_barcodes_file, "hg19_genes", "Sample_6_BAF")

Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 251, in xclonedata
cell_anno.index.name = None
UnboundLocalError: local variable 'cell_anno' referenced before assignment

Maybe there's something amiss with my .mtx objects, because I have 3 rather than two: cellSNP.tag.AD.mtx cellSNP.tag.DP.mtx cellSNP.tag.OTH.mtx, but running an adapted version of xclone.pp.xclonedata leads to the same error

BAF_adata = xclone.pp.xclonedata([AD_file, DP_file, OTH_file], 'BAF', mtx_barcodes_file, "hg19_genes", "Sample_6_BAF")
Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 251, in xclonedata
cell_anno.index.name = None
UnboundLocalError: local variable 'cell_anno' referenced before assignment

I wondered whether I needed to initially load the features.tsv from the RDR analysis first as cell_anno, but that also kicks up the same error.

Following your intuition of looking at the log files, I inspected it, but don't seem to find anything that would mean my resolved xcltk issue actually isn't resolved? Please find the log file attached! You will surely identify something I can't!

Many thanks,

Stuart
pileup.log

@stublakemore
Copy link
Author

Dear Xianjie,

I would really appreciate it if you could give me your insight on this! I don't have any further developments to report regarding either a solution or the cause of the issue.

Cheers,

Stuart

@hxj5
Copy link
Collaborator

hxj5 commented Dec 13, 2024

Do you have any suggestions @Rongtingting?

@Rongtingting
Copy link
Collaborator

Hi Stuart,
@stublakemore
Could you try https://xclone-cnv.readthedocs.io/en/latest/preprocessing.html#baf-load this or attach the codes you used?
I am afraid that you did not specify the right path for the data files?

Bests,
Rongting

@stublakemore
Copy link
Author

Dear Rongting,

Thanks for getting in touch. The exact codes I used is in my initial issue query at the top of the message chain. Specifically, I wonder whether it's a problem with the xlctk baf preprocessing, because not only do I have these file names rather than the readthedocs names: cellSNP.tag.AD.mtx cellSNP.tag.DP.mtx cellSNP.tag.OTH.mtx rather than "AD.mtx" & "DP.mtx", but I also rather have cellSNP.samples.tsv rather than "barcodes.tsv". I've tried taking the barcodes.tsv file from the RDR pre-processing object, without success... Below my code:

Attempt 1:

data_dir = "/projects/mpi-sclc/sblakemo/Sample_6/Sample_6_baf/pileup/"
AD_file = data_dir + "cellSNP.tag.AD.mtx"
DP_file = data_dir + "cellSNP.tag.DP.mtx"
mtx_barcodes_file = data_dir + "cellSNP.samples.tsv"
BAF_adata = xclone.pp.xclonedata([AD_file, DP_file], 'BAF',
... mtx_barcodes_file,
... genome_mode = "hg19_genes")
Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 262, in xclonedata
Xadata = AnnData(AD, obs=cell_anno, var=regions_anno) # dtype='int32'
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 254, in init
self._init_as_actual(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 428, in _init_as_actual
self._var = _gen_dataframe(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/functools.py", line 888, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/aligned_df.py", line 65, in _gen_dataframe_df
raise _mk_df_error(source, attr, length, len(anno))
ValueError: Observations annot. var must have as many rows as X has columns (1680), but has 32696 rows.

Attempt 2:

data_dir = "/projects/mpi-sclc/sblakemo/Sample_6/Sample_6_baf/pileup/"
AD_file = data_dir + "cellSNP.tag.AD.mtx"
DP_file = data_dir + "cellSNP.tag.DP.mtx"
mtx_barcodes_file = data_dir + "barcodes.tsv"
BAF_adata = xclone.pp.xclonedata([AD_file, DP_file], 'BAF',
... mtx_barcodes_file,
... genome_mode = "hg19_genes")
Traceback (most recent call last):
File "", line 1, in
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 262, in xclonedata
Xadata = AnnData(AD, obs=cell_anno, var=regions_anno) # dtype='int32'
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 254, in init
self._init_as_actual(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 428, in _init_as_actual
self._var = _gen_dataframe(
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/functools.py", line 888, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/projects/mpi-sclc/sblakemo/anaconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/aligned_df.py", line 65, in _gen_dataframe_df
raise _mk_df_error(source, attr, length, len(anno))
ValueError: Observations annot. var must have as many rows as X has columns (1680), but has 32696 rows.

Here also the head of the AD matrix, which shows 4 columns but only containing 3 columns worth of data...
head cellSNP.tag.AD.mtx
%%MatrixMarket matrix coordinate integer general
%
1680 757 74066
1 6 1
1 42 1
1 43 1
1 51 1
1 90 1
1 118 1
1 120 1

which is the same for the DP matrix
%%MatrixMarket matrix coordinate integer general
%
1680 757 151948
1 6 1
1 12 2
1 25 1
1 42 1
1 43 1
1 49 1
1 51 1

If you need anything else from me to be able to resolve the issue, let me know and I'll look to provide further information!

Cheers,

Stuart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants