Bugfix nonmatching dimnames #78

csoneson · 2023-08-16T19:12:58Z

Here's a first attempt to address #70, as well as increasing the consistency of utilities.R in terms of indentation, spaces and assignment operators.
Currently the situation where assays are unnamed is not handled well by get_count_datasets(). Do we want to require named assays? At the moment, adding a check for this makes nest() fail (even if the assays are in fact named).

stemangiola · 2023-08-16T23:24:14Z

Here's a first attempt to address #70, as well as increasing the consistency of utilities.R in terms of indentation, spaces and assignment operators. Currently the situation where assays are unnamed is not handled well by get_count_datasets(). Do we want to require named assays? At the moment, adding a check for this makes nest() fail (even if the assays are in fact named).

Thanks!

I cannot easily identify which functions you did change. Could you please point them to me?

(About the reformatting, I am also trying to replace the %>% pipe with the |> one. But maybe it is another project. If you happen to make new code please use |>).

csoneson · 2023-08-17T05:40:41Z

I cannot easily identify which functions you did change. Could you please point them to me?

Of course, sorry about that. The main code changes are in this commit: 8659e53 (there are also some changes in indentation, but if you append ?w=1 to the URL you can hide those whitespace-only changes)

I pulled out the checks of dimnames into a new function (check_se_dimnames()) as they are now used in multiple places
This function is now used in get_count_datasets() and order_assays_internally_to_be_consistent()
I gave check_if_assays_are_NOT_overlapped() a new argument to allow checking either row or column names
I also added unit tests for get_count_datasets() in bc47ac6
The other changes are cosmetic

stemangiola

Thanks for this PR! It is great.

As I understand, now if a SummarizedExperiment has inconsistent dim names, and we load library(tidySummarizedExperiment), as soon as we print the object on screen, the as_tibble function (called by print in the evaluation) will through an informative error.

This is good as it was throwing an error anyway.

However, while a user was able to do stuff with an inconsistent object, now is stopped from doing so, let's say printing the object on screen. I would say a more informative message, that maybe also says

Please do detach("tidySummarizedExperiment, ...") to keep operating on your object as ...

Or, is there a scope to make as_tibble work with inconsistent objects (probably not, but I would like as least to think about it).

Having said all that, I think inconsistent objects as a pretty edge case.

What do you think @csoneson and @mikelove?

(also one check is failing)

stemangiola · 2023-08-17T06:01:30Z

R/utilities.R

+        namefcn <- rownames
+        dimfcn <- nrow
+    } else {
+        namefcn <- colnames
+        dimfcn <- ncol


I try to keep the codebase acronyms-and-abbreviations-free. Would you mind replacing ...fcn (and other acronyms I might have missed) with a verbose, self-explanatory alternative?

This helps all developers involved immediately understand.

csoneson · 2023-08-17T09:58:32Z

Thanks @stemangiola - I have made the function names (hopefully) more informative.

(also one check is failing)

Regarding the error in the Windows GHA (or did this refer to something else?) I think it's unrelated to this PR, as it was already there e.g. in https://github.com/stemangiola/tidySummarizedExperiment/actions/runs/5634008230/job/15263497689 (I did not yet look into what the reason might be).

Regarding the inconsistent objects, I would agree that they're probably rare, but since they are valid SummarizedExperiment objects I also agree that it should be possible to at least do tidy-independent operations on it.

stemangiola · 2023-08-17T11:20:43Z

Regarding the inconsistent objects, I would agree that they're probably rare, but since they are valid SummarizedExperiment objects I also agree that it should be possible to at least do tidy-independent operations on it.

Do we want to try to make as_tibble robust against this case? And if it works we can convert the stop into warning.

csoneson · 2023-08-18T07:27:11Z

I think there are two different aspects at the moment that would need to be addressed:

Unnamed assays. This could perhaps be fixed by giving them names (assay1, etc), similar to what is currently done for row and column names if they are completely missing.
Inconsistent naming between assays or with the SE (say, one assay with feature IDs g1, ...,gN, and another with f1, ..., fN. Here I'm not so sure what to do - I guess we could consider them different features, with missing values in the assay(s) where they are not observed (which would make the tibble longer). However, this would somehow go against the assumption of a SE that the assays should be aligned (it would align better with the altExps). The other option would be to select the names from the first assay and propagate them to any assay that have non-overlapping names, but that feels dangerous. Moreover, what would we do if all names but one agree (e.g., if there was only a single fJ in the middle of the second assay above, but all the other names agreed).

stemangiola · 2023-08-18T07:41:53Z

Unnamed assays. This could perhaps be fixed by giving them names (assay1, etc), similar to what is currently done for row and column names if they are completely missing.

I agree. We could make this a new challenge as it is somehow independent of this one.

I guess we could consider them different features, with missing values in the assay(s) where they are not observed (which would make the tibble longer).

This seems modular and makes sense.

However, this would somehow go against the assumption of a SE that the assays should be aligned (it would align better with the altExps).

If the SE allows it, that is the responsibility of the framework, and we can be tranparent to that.

And I am not even sure why as_tibble fails. Maybe I am doing a left join for the assays, but this should not fail.

…ty object when dimnames are not shared

…ad of bind_cols

csoneson · 2023-08-20T09:33:42Z

I made some changes in the three commits above (still needs more testing, but I wanted to check what you think). It now allows non-overlapping dimnames, and will expand the tibble accordingly, and throw warnings whenever any assumptions are made.

stemangiola · 2023-08-20T12:15:24Z

Thanks @csoneson ! @sdgamboa please test this branch with your dataset and see if this looks as you would expect.

stemangiola · 2023-08-20T12:30:40Z

@csoneson please add your authorship details here https://docs.google.com/spreadsheets/d/19XqhN3xAMekCJ-esAolzoWT6fttruSEermjIsrOFcoo/edit?usp=sharing

stemangiola · 2023-08-21T00:46:06Z

Unnamed assays. This could perhaps be fixed by giving them names (assay1, etc), similar to what is currently done for row and column names if they are completely missing.

@csoneson new challenge created

#79

stemangiola · 2023-08-21T01:58:00Z

@csoneson I was thinking, what happens now if we try to apply the methods on this SE.

mutate
slice
select
nest, unnest
left_join

csoneson · 2023-08-21T07:01:24Z

I think that trying to operate in this way directly on an inconsistent SE will cause problems. I still have to understand exactly where this is happening, but I get e.g.:

> se1 |> mutate(v = 3)
Error in data.frame(., row.names = pull(., !!s_(se)$symbol), check.names = FALSE) : 
  row names contain missing values

where se1 is a SE where dimnames don't agree between assays and the SE itself. After converting to a tibble of course things "work" in the sense that they don't raise errors (whether it's sensible for downstream use is a different question, but there are lots of warnings provided to the user).

stemangiola · 2023-08-21T07:03:42Z

I think that trying to operate in this way directly on an inconsistent SE will cause problems. I still have to understand exactly where this is happening, but I get e.g.:
> se1 |> mutate(v = 3)
Error in data.frame(., row.names = pull(., !!s_(se)$symbol), check.names = FALSE) : 
  row names contain missing values
where se1 is a SE where dimnames don't agree between assays and the SE itself. After converting to a tibble of course things "work" in the sense that they don't raise errors (whether it's sensible for downstream use is a different question, but there are lots of warnings provided to the user).

OK, I have the feeling that adapting methods for this case will be incredibly time-consuming and not so rewarding, as this is an edge case.

I am OK if methods fail, as long as we have issued the warnings.

Should we merge this PR and accept after-warning failures?

csoneson · 2023-08-21T07:06:36Z

Should we merge this PR and accept after-warning failures?

Sure, if the changes look sensible to you (I'm still getting used to the codebase and hope I haven't missed an implication somewhere - at least all the tests pass, so nothing that is covered there seems broken 🙂). It may be good to merge it to have a common starting points for further edits and more tests.

stemangiola · 2023-08-21T07:12:15Z

Congrats and welcome to the team! 🎉

sdgamboa · 2023-08-22T14:53:04Z

@csoneson, @stemangiola, thanks! It works.

## BiocManager::install('stemangiola/tidySummarizedExperiment', force = TRUE)
suppressMessages({
    library(curatedMetagenomicData)
    library(tidySummarizedExperiment)
})
dataset_name <- "HallAB_2017.relative_abundance"
tse <- curatedMetagenomicData(
    pattern = dataset_name, 
    dryrun = FALSE, rownames = 'NCBI',
    counts = TRUE
)[[1]]
#> 
#> $`2021-10-14.HallAB_2017.relative_abundance`
#> dropping rows without rowTree matches:
#>   k__Bacteria|p__Actinobacteria|c__Coriobacteriia|o__Coriobacteriales|f__Atopobiaceae|g__Olsenella|s__Olsenella_profusa
#>   k__Bacteria|p__Actinobacteria|c__Coriobacteriia|o__Coriobacteriales|f__Coriobacteriaceae|g__Collinsella|s__Collinsella_stercoris
#>   k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Carnobacteriaceae|g__Granulicatella|s__Granulicatella_elegans
#>   k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcus|s__Ruminococcus_champanellensis
#>   k__Bacteria|p__Firmicutes|c__Erysipelotrichia|o__Erysipelotrichales|f__Erysipelotrichaceae|g__Bulleidia|s__Bulleidia_extructa
#>   k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Burkholderiales|f__Sutterellaceae|g__Sutterella|s__Sutterella_parvirubra
#>   k__Bacteria|p__Synergistetes|c__Synergistia|o__Synergistales|f__Synergistaceae|g__Cloacibacillus|s__Cloacibacillus_evryensis
tse 
#> class: TreeSummarizedExperiment 
#> dim: 503 259 
#> metadata(1): agglomerated_by_rank
#> assays(1): relative_abundance
#> rownames(503): 853 820 ... 172901 1262744
#> rowData names(7): superkingdom phylum ... genus species
#> colnames(259): p8582_mo1 p8582_mo10 ... SKST041_2_G103027
#>   SKST041_3_G103028
#> colData names(24): study_name subject_id ... HBI SCCAI
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (503 rows)
#> rowTree: 1 phylo tree(s) (10430 leaves)
#> colLinks: NULL
#> colTree: NULL
class(tse)
#> [1] "TreeSummarizedExperiment"
#> attr(,"package")
#> [1] "TreeSummarizedExperiment"
tidy_tse <- tidySummarizedExperiment::as_tibble(tse)
#> Warning in check_se_dimnames(se): tidySummarizedExperiment says: the assays in
#> your SummarizedExperiment have row names, but they don't agree with the row
#> names of the SummarizedExperiment object itself. It is strongly recommended to
#> make the assays consistent, to avoid erroneous matching of features.
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       Ubuntu 22.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-08-22
#>  pandoc   3.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package                  * version   date (UTC) lib source
#>  abind                      1.4-5     2016-07-21 [1] CRAN (R 4.1.2)
#>  AnnotationDbi              1.62.2    2023-07-02 [1] Bioconductor
#>  AnnotationHub              3.8.0     2023-04-25 [1] Bioconductor
#>  ape                        5.7-1     2023-03-13 [1] CRAN (R 4.2.2)
#>  beachmat                   2.16.0    2023-04-25 [1] Bioconductor
#>  beeswarm                   0.4.0     2021-06-01 [1] CRAN (R 4.1.2)
#>  Biobase                  * 2.60.0    2023-04-25 [1] Bioconductor
#>  BiocFileCache              2.8.0     2023-04-25 [1] Bioconductor
#>  BiocGenerics             * 0.46.0    2023-04-25 [1] Bioconductor
#>  BiocManager                1.30.22   2023-08-08 [1] CRAN (R 4.3.1)
#>  BiocNeighbors              1.18.0    2023-04-25 [1] Bioconductor
#>  BiocParallel               1.34.2    2023-05-22 [1] Bioconductor
#>  BiocSingular               1.16.0    2023-04-25 [1] Bioconductor
#>  BiocVersion                3.17.1    2022-11-04 [1] Bioconductor
#>  Biostrings               * 2.68.1    2023-05-16 [1] Bioconductor
#>  bit                        4.0.5     2022-11-15 [1] CRAN (R 4.2.2)
#>  bit64                      4.0.5     2020-08-30 [1] CRAN (R 4.1.2)
#>  bitops                     1.0-7     2021-04-24 [1] CRAN (R 4.1.2)
#>  blob                       1.2.4     2023-03-17 [1] CRAN (R 4.2.2)
#>  cachem                     1.0.8     2023-05-01 [1] CRAN (R 4.2.2)
#>  cli                        3.6.1     2023-03-23 [1] CRAN (R 4.2.2)
#>  cluster                    2.1.4     2022-08-22 [1] CRAN (R 4.3.0)
#>  codetools                  0.2-19    2023-02-01 [3] CRAN (R 4.2.2)
#>  colorspace                 2.1-0     2023-01-23 [1] CRAN (R 4.2.2)
#>  crayon                     1.5.2     2022-09-29 [1] CRAN (R 4.2.1)
#>  curatedMetagenomicData   * 3.8.0     2023-04-27 [1] Bioconductor
#>  curl                       5.0.2     2023-08-14 [1] CRAN (R 4.3.1)
#>  data.table                 1.14.8    2023-02-17 [1] CRAN (R 4.2.2)
#>  DBI                        1.1.3     2022-06-18 [1] CRAN (R 4.2.2)
#>  dbplyr                     2.3.3     2023-07-07 [1] CRAN (R 4.3.1)
#>  DECIPHER                   2.28.0    2023-04-25 [1] Bioconductor
#>  decontam                   1.20.0    2023-04-25 [1] Bioconductor
#>  DelayedArray               0.26.7    2023-07-28 [1] Bioconductor
#>  DelayedMatrixStats         1.22.5    2023-08-10 [1] Bioconductor
#>  digest                     0.6.33    2023-07-07 [1] CRAN (R 4.3.1)
#>  DirichletMultinomial       1.42.0    2023-04-25 [1] Bioconductor
#>  dplyr                      1.1.2     2023-04-20 [1] CRAN (R 4.2.2)
#>  ellipsis                   0.3.2     2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate                   0.21      2023-05-05 [1] CRAN (R 4.2.2)
#>  ExperimentHub              2.8.1     2023-07-12 [1] Bioconductor
#>  fansi                      1.0.4     2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap                    1.1.1     2023-02-24 [1] CRAN (R 4.2.2)
#>  filelock                   1.0.2     2018-10-05 [1] CRAN (R 4.1.2)
#>  fs                         1.6.3     2023-07-20 [1] CRAN (R 4.3.1)
#>  generics                   0.1.3     2022-07-05 [1] CRAN (R 4.2.1)
#>  GenomeInfoDb             * 1.36.1    2023-06-21 [1] Bioconductor
#>  GenomeInfoDbData           1.2.10    2023-05-09 [1] Bioconductor
#>  GenomicRanges            * 1.52.0    2023-04-25 [1] Bioconductor
#>  ggbeeswarm                 0.7.2     2023-04-29 [1] CRAN (R 4.2.2)
#>  ggplot2                    3.4.3     2023-08-14 [1] CRAN (R 4.3.1)
#>  ggrepel                    0.9.3     2023-02-03 [1] CRAN (R 4.2.2)
#>  glue                       1.6.2     2022-02-24 [1] CRAN (R 4.3.0)
#>  gridExtra                  2.3       2017-09-09 [1] CRAN (R 4.1.2)
#>  gtable                     0.3.4     2023-08-21 [1] CRAN (R 4.3.1)
#>  htmltools                  0.5.6     2023-08-10 [1] CRAN (R 4.3.1)
#>  htmlwidgets                1.6.2     2023-03-17 [1] CRAN (R 4.2.2)
#>  httpuv                     1.6.11    2023-05-11 [1] CRAN (R 4.3.0)
#>  httr                       1.4.7     2023-08-15 [1] CRAN (R 4.3.1)
#>  interactiveDisplayBase     1.38.0    2023-04-25 [1] Bioconductor
#>  IRanges                  * 2.34.1    2023-06-22 [1] Bioconductor
#>  irlba                      2.3.5.1   2022-10-03 [1] CRAN (R 4.2.1)
#>  jsonlite                   1.8.7     2023-06-29 [1] CRAN (R 4.3.1)
#>  KEGGREST                   1.40.0    2023-04-25 [1] Bioconductor
#>  knitr                      1.43      2023-05-25 [1] CRAN (R 4.3.0)
#>  later                      1.3.1     2023-05-02 [1] CRAN (R 4.2.2)
#>  lattice                    0.21-8    2023-04-05 [3] CRAN (R 4.3.0)
#>  lazyeval                   0.2.2     2019-03-15 [1] CRAN (R 4.1.2)
#>  lifecycle                  1.0.3     2022-10-07 [1] CRAN (R 4.2.1)
#>  magrittr                   2.0.3     2022-03-30 [1] CRAN (R 4.1.3)
#>  MASS                       7.3-60    2023-05-04 [3] CRAN (R 4.3.1)
#>  Matrix                     1.6-1     2023-08-14 [1] CRAN (R 4.3.1)
#>  MatrixGenerics           * 1.12.3    2023-07-30 [1] Bioconductor
#>  matrixStats              * 1.0.0     2023-06-02 [1] CRAN (R 4.3.0)
#>  memoise                    2.0.1     2021-11-26 [1] CRAN (R 4.1.2)
#>  mgcv                       1.9-0     2023-07-11 [3] CRAN (R 4.3.1)
#>  mia                        1.8.0     2023-04-25 [1] Bioconductor
#>  mime                       0.12      2021-09-28 [1] CRAN (R 4.1.2)
#>  MultiAssayExperiment       1.27.0    2023-08-11 [1] Bioconductor
#>  munsell                    0.5.0     2018-06-12 [1] CRAN (R 4.1.2)
#>  nlme                       3.1-163   2023-08-09 [3] CRAN (R 4.3.1)
#>  permute                    0.9-7     2022-01-27 [1] CRAN (R 4.1.2)
#>  pillar                     1.9.0     2023-03-22 [1] CRAN (R 4.2.2)
#>  pkgconfig                  2.0.3     2019-09-22 [1] CRAN (R 4.1.2)
#>  plotly                     4.10.2    2023-06-03 [1] CRAN (R 4.3.0)
#>  plyr                       1.8.8     2022-11-11 [1] CRAN (R 4.2.2)
#>  png                        0.1-8     2022-11-29 [1] CRAN (R 4.2.2)
#>  promises                   1.2.1     2023-08-10 [1] CRAN (R 4.3.1)
#>  purrr                      1.0.2     2023-08-10 [1] CRAN (R 4.3.1)
#>  R.cache                    0.16.0    2022-07-21 [1] CRAN (R 4.2.1)
#>  R.methodsS3                1.8.2     2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo                       1.25.0    2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils                    2.12.2    2022-11-11 [1] CRAN (R 4.2.2)
#>  R6                         2.5.1     2021-08-19 [1] CRAN (R 4.1.2)
#>  rappdirs                   0.3.3     2021-01-31 [1] CRAN (R 4.1.2)
#>  Rcpp                       1.0.11    2023-07-06 [1] CRAN (R 4.3.1)
#>  RCurl                      1.98-1.12 2023-03-27 [1] CRAN (R 4.2.2)
#>  reprex                     2.0.2     2022-08-17 [1] CRAN (R 4.2.1)
#>  reshape2                   1.4.4     2020-04-09 [1] CRAN (R 4.1.2)
#>  rlang                      1.1.1     2023-04-28 [1] CRAN (R 4.2.2)
#>  rmarkdown                  2.24      2023-08-14 [1] CRAN (R 4.3.1)
#>  RSQLite                    2.3.1     2023-04-03 [1] CRAN (R 4.2.2)
#>  rstudioapi                 0.15.0    2023-07-07 [1] CRAN (R 4.3.1)
#>  rsvd                       1.0.5     2021-04-16 [1] CRAN (R 4.1.2)
#>  S4Arrays                   1.0.5     2023-07-24 [1] Bioconductor
#>  S4Vectors                * 0.38.1    2023-05-02 [1] Bioconductor
#>  ScaledMatrix               1.8.1     2023-05-03 [1] Bioconductor
#>  scales                     1.2.1     2022-08-20 [1] CRAN (R 4.2.1)
#>  scater                     1.28.0    2023-04-25 [1] Bioconductor
#>  scuttle                    1.10.2    2023-08-03 [1] Bioconductor
#>  sessioninfo                1.2.2     2021-12-06 [1] CRAN (R 4.1.2)
#>  shiny                      1.7.5     2023-08-12 [1] CRAN (R 4.3.1)
#>  SingleCellExperiment     * 1.22.0    2023-04-25 [1] Bioconductor
#>  sparseMatrixStats          1.12.2    2023-07-02 [1] Bioconductor
#>  stringi                    1.7.12    2023-01-11 [1] CRAN (R 4.2.2)
#>  stringr                    1.5.0     2022-12-02 [1] CRAN (R 4.2.2)
#>  styler                     1.10.1    2023-06-05 [1] CRAN (R 4.3.1)
#>  SummarizedExperiment     * 1.30.2    2023-06-06 [1] Bioconductor
#>  tibble                     3.2.1     2023-03-20 [1] CRAN (R 4.2.2)
#>  tidyr                      1.3.0     2023-01-24 [1] CRAN (R 4.2.2)
#>  tidyselect                 1.2.0     2022-10-10 [1] CRAN (R 4.2.1)
#>  tidySummarizedExperiment * 1.11.1    2023-08-22 [1] Github (stemangiola/tidySummarizedExperiment@c798f5b)
#>  tidytree                   0.4.5     2023-08-10 [1] CRAN (R 4.3.1)
#>  treeio                     1.24.3    2023-07-24 [1] Bioconductor
#>  TreeSummarizedExperiment * 2.8.0     2023-04-25 [1] Bioconductor
#>  ttservice                * 0.3.6     2023-06-20 [1] RSPM (R 4.3.0)
#>  utf8                       1.2.3     2023-01-31 [1] CRAN (R 4.2.2)
#>  vctrs                      0.6.3     2023-06-14 [1] CRAN (R 4.3.0)
#>  vegan                      2.6-4     2022-10-11 [1] CRAN (R 4.2.1)
#>  vipor                      0.4.5     2017-03-22 [1] CRAN (R 4.1.2)
#>  viridis                    0.6.4     2023-07-22 [1] CRAN (R 4.3.1)
#>  viridisLite                0.4.2     2023-05-02 [1] CRAN (R 4.2.2)
#>  withr                      2.5.0     2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun                       0.40      2023-08-09 [1] CRAN (R 4.3.1)
#>  xtable                     1.8-4     2019-04-21 [1] CRAN (R 4.1.2)
#>  XVector                  * 0.40.0    2023-04-25 [1] Bioconductor
#>  yaml                       2.3.7     2023-01-23 [1] CRAN (R 4.2.2)
#>  yulab.utils                0.0.7     2023-08-09 [1] CRAN (R 4.3.1)
#>  zlibbioc                   1.46.0    2023-04-25 [1] Bioconductor
#> 
#>  [1] /mnt/STORE1/lib/R/bioc-release
#>  [2] /usr/lib/R/site-library
#>  [3] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

^{Created on 2023-08-22 with reprex v2.0.2}

stemangiola · 2023-08-26T08:24:20Z

@csoneson I noticed that the error triggers if we have duplicated colnames, because of the intersect, length approach. Also, a warning of duplicated colnames would be great.

Can you please remind me if we did not decide at the end to convert to warning? Actually, being an error saved an analysis of mine. I am conflicted about what behaviours between error and warning would be best.

csoneson · 2023-08-27T19:05:31Z

Hm, good point. I'm thinking that this is a bit different from the other situations and may warrant an error, for a couple of reasons:

What do we print if one assay has column names S1, S2, S3, and another has S1, S1, S3 - which of the S1 samples in the second assay would be matched to the S1 sample in the first assay? I don't see an obvious solution here other than making the column names of the first assay unique, and then none of them would overlap with the second assay.
This kind of situation seems to raise an error also e.g. in as_tibble(), via gather():

> as_tibble(se) 
Error in `map2()`:
ℹ In index: 1.
ℹ With name: mat.
Caused by error in `gather()`:
! Names must be unique.
✖ These names are duplicated:
  * "S1" at locations 1 and 2.

stemangiola · 2023-08-27T23:46:13Z

Yes unless we do clever masking for all assays this would be a problem.

So I think we can

Create a duplicated names error check
correct the mismatch check to allow for duplication, for keeping these two checks independent. For example the mismatch check triggers even if I have

assay_1: A, A, B, C
assay_2: A, A, B, C

Now there is no mismatch technically but there is duplication. So just the duplication test should trigger, but not the mismatch check.

csoneson · 2023-08-28T13:39:57Z

I gave it a go here: #80
I had to change the logic a bit to deal with situations of the type:
assay_1: A, A, B, C
assay_2: A, B, B, C
in which case the intersection of the names is the same as the number of unique entries for each, but they are not in fact the same...

csoneson added 5 commits August 16, 2023 20:53

Fix typo

cf779c3

Add function to check that dimnames are sensible for get_count_datasets

8659e53

Increase consistency (spacing, assignment) in utilities.R

6cd7ef9

Adapt test code to new function argument

a2a795e

Add unit tests for utilities

bc47ac6

This was linked to issues Aug 16, 2023

Tidy operations with tidySE when colData and rowData share column names #76

Open

tidySE manage to_tibble if rownames not match, or are absent #70

Closed

This was unlinked from issues Aug 16, 2023

tidySE manage to_tibble if rownames not match, or are absent #70

Closed

Tidy operations with tidySE when colData and rowData share column names #76

Open

stemangiola linked an issue Aug 16, 2023 that may be closed by this pull request

tidySE manage to_tibble if rownames not match, or are absent #70

Closed

stemangiola assigned csoneson Aug 16, 2023

stemangiola self-requested a review August 17, 2023 06:03

stemangiola added the bug Something isn't working label Aug 17, 2023

stemangiola requested changes Aug 17, 2023

View reviewed changes

Use more informative function names

edc9f81

csoneson added 3 commits August 20, 2023 11:29

Replace inner_join by full_join in as_tibble, to avoid getting an emp…

e03463d

…ty object when dimnames are not shared

Error -> warning in case of nonmatching dimnames, use full_join inste…

d614e7f

…ad of bind_cols

Adapt tests

59f6893

This was referenced Aug 21, 2023

tidySE deal with missing assay names #79

Open

tidySCE deal with missing assay names stemangiola/tidySingleCellExperiment#82

Open

stemangiola merged commit c798f5b into stemangiola:master Aug 21, 2023

csoneson deleted the bugfix-nonmatching-dimnames branch August 21, 2023 07:13

sdgamboa mentioned this pull request Aug 22, 2023

rownames not matching in tse and assay when using rownames = NCBI waldronlab/curatedMetagenomicData#298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix nonmatching dimnames #78

Bugfix nonmatching dimnames #78

csoneson commented Aug 16, 2023

stemangiola commented Aug 16, 2023

csoneson commented Aug 17, 2023

stemangiola left a comment •

edited

Loading

stemangiola Aug 17, 2023 •

edited

Loading

csoneson commented Aug 17, 2023

stemangiola commented Aug 17, 2023

csoneson commented Aug 18, 2023

stemangiola commented Aug 18, 2023 •

edited

Loading

csoneson commented Aug 20, 2023

stemangiola commented Aug 20, 2023

stemangiola commented Aug 20, 2023

stemangiola commented Aug 21, 2023

stemangiola commented Aug 21, 2023

csoneson commented Aug 21, 2023

stemangiola commented Aug 21, 2023

csoneson commented Aug 21, 2023

stemangiola commented Aug 21, 2023

sdgamboa commented Aug 22, 2023

stemangiola commented Aug 26, 2023

csoneson commented Aug 27, 2023

stemangiola commented Aug 27, 2023

csoneson commented Aug 28, 2023

Bugfix nonmatching dimnames #78

Bugfix nonmatching dimnames #78

Conversation

csoneson commented Aug 16, 2023

stemangiola commented Aug 16, 2023

csoneson commented Aug 17, 2023

stemangiola left a comment • edited Loading

Choose a reason for hiding this comment

stemangiola Aug 17, 2023 • edited Loading

Choose a reason for hiding this comment

csoneson commented Aug 17, 2023

stemangiola commented Aug 17, 2023

csoneson commented Aug 18, 2023

stemangiola commented Aug 18, 2023 • edited Loading

csoneson commented Aug 20, 2023

stemangiola commented Aug 20, 2023

stemangiola commented Aug 20, 2023

stemangiola commented Aug 21, 2023

stemangiola commented Aug 21, 2023

csoneson commented Aug 21, 2023

stemangiola commented Aug 21, 2023

csoneson commented Aug 21, 2023

stemangiola commented Aug 21, 2023

sdgamboa commented Aug 22, 2023

stemangiola commented Aug 26, 2023

csoneson commented Aug 27, 2023

stemangiola commented Aug 27, 2023

csoneson commented Aug 28, 2023

stemangiola left a comment •

edited

Loading

stemangiola Aug 17, 2023 •

edited

Loading

stemangiola commented Aug 18, 2023 •

edited

Loading